GiST API Adancement

Started by Yuan Dongover 8 years ago6 messages
#1Yuan Dong
Doffery@hotmail.com

Hi hackers,

I want to hack a little. I applied for GSoC project (https://wiki.postgresql.org/wiki/GSoC_2017#GiST_API_advancement), but now I'm going to hack on my own. With the help of Andrew Borodin, I want to start the project with adding a third state to collision check. The third state is that:
subtree is totally within the query. In this case, GiST scan can scan down subtree without key checks.

After reading some code, I get this plan to modify the code:

1. Modify the consistent function of datatypes supported by GiST.

1.1 Start with cube, g_cube_consistent should return a third state when calling g_cube_internal_consistent. Inspired by Andrew Borodin, I have two solutions, 1)modify return value, 2) modify a reference type parameter.

2. Modify the gistindex_keytest in gistget.c to return multiple states

2.1 Need declare a new enum type(or define values) in gist_private.h or somewhere

3. Modify the gitsScanPage in gistget.c to deal with the extra state

4. Add a state to mark the nodes under this GISTSearchItem are all with in the query

4.1 Two ways to do this: 1) add a state to GISTSearchItem (prefered) 2) Somewhere else to record all this kind of items

4.2 We may use the block number as the key

4.3 Next time when the gistScanPage met this item, just find all the leaves directly(need a new function)

After this, I shall start to benchmark the improvement and edit the code of other datatypes.

Hope you hackers can give me some suggestions~

--
Dong

#2Andrew Borodin
borodin@octonica.com
In reply to: Yuan Dong (#1)
Re: GiST API Adancement

Hi, Dong!

2017-06-15 21:19 GMT+05:00 Yuan Dong <Doffery@hotmail.com>:

I'm going to hack on my own. With the help of Andrew Borodin, I want to
start the project with adding a third state to collision check. The third
state is that:
subtree is totally within the query. In this case, GiST scan can scan down
subtree without key checks.

That's fine!

After reading some code, I get this plan to modify the code:

1. Modify the consistent function of datatypes supported by GiST.

1.1 Start with cube, g_cube_consistent should return a third state when
calling g_cube_internal_consistent. Inspired by Andrew Borodin, I have two
solutions, 1)modify return value, 2) modify a reference type parameter.

2. Modify the gistindex_keytest in gistget.c to return multiple states

2.1 Need declare a new enum type(or define values) in gist_private.h or
somewhere

3. Modify the gitsScanPage in gistget.c to deal with the extra state

4. Add a state to mark the nodes under this GISTSearchItem are all with in
the query

4.1 Two ways to do this: 1) add a state to GISTSearchItem (prefered) 2)
Somewhere else to record all this kind of items

4.2 We may use the block number as the key

4.3 Next time when the gistScanPage met this item, just find all the leaves
directly(need a new function)

After this, I shall start to benchmark the improvement and edit the code of
other datatypes.

Hope you hackers can give me some suggestions~

I think there is one more aspect of development: backward
compatibility: it's impossible to update all existing extensions. This
is not that major feature to ignore them.

Though for benchmarking purposes backward compatibility can be omitted.

Best regards, Andrey Borodin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Yuan Dong
Doffery@hotmail.com
In reply to: Andrew Borodin (#2)
答复: GiST API Adancement

Hi Andrey,

Thank you for your suggestion.

I should still maintain original API of GiST after modification.

--

Dong

________________________________
发件人: Andrew Borodin <borodin@octonica.com>
发送时间: 2017年6月16日 6:24:03
收件人: Yuan Dong
抄送: pgsql-hackers@postgresql.org
主题: Re: GiST API Adancement

Hi, Dong!

2017-06-15 21:19 GMT+05:00 Yuan Dong <Doffery@hotmail.com>:

I'm going to hack on my own. With the help of Andrew Borodin, I want to
start the project with adding a third state to collision check. The third
state is that:
subtree is totally within the query. In this case, GiST scan can scan down
subtree without key checks.

That's fine!

After reading some code, I get this plan to modify the code:

1. Modify the consistent function of datatypes supported by GiST.

1.1 Start with cube, g_cube_consistent should return a third state when
calling g_cube_internal_consistent. Inspired by Andrew Borodin, I have two
solutions, 1)modify return value, 2) modify a reference type parameter.

2. Modify the gistindex_keytest in gistget.c to return multiple states

2.1 Need declare a new enum type(or define values) in gist_private.h or
somewhere

3. Modify the gitsScanPage in gistget.c to deal with the extra state

4. Add a state to mark the nodes under this GISTSearchItem are all with in
the query

4.1 Two ways to do this: 1) add a state to GISTSearchItem (prefered) 2)
Somewhere else to record all this kind of items

4.2 We may use the block number as the key

4.3 Next time when the gistScanPage met this item, just find all the leaves
directly(need a new function)

After this, I shall start to benchmark the improvement and edit the code of
other datatypes.

Hope you hackers can give me some suggestions~

I think there is one more aspect of development: backward
compatibility: it's impossible to update all existing extensions. This
is not that major feature to ignore them.

Though for benchmarking purposes backward compatibility can be omitted.

Best regards, Andrey Borodin

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Yuan Dong (#3)
Re: 答复: GiST API Adancement

Yuan Dong <Doffery@hotmail.com> writes:

·¢¼þÈË: Andrew Borodin <borodin@octonica.com>

I think there is one more aspect of development: backward
compatibility: it's impossible to update all existing extensions. This
is not that major feature to ignore them.

I should still maintain original API of GiST after modification.

To put a little context on that: we have always felt that it's okay
to require extension authors to make small adjustments when going to
a new major release. For instance, adding a new parameter to a
globally visible function is fine, especially if callers can just
pass NULL or some such to get the old behavior. So in the context
here, you shouldn't feel compelled to come up with a bizarre API
design just to preserve exact compatibility of old code. You should
indeed think about reducing the amount of work that extension
authors have to do to update, but that doesn't have to mean "zero".
Also, it's wise to make sure that any places where code changes
have to be made will result in compile errors if the change isn't
made.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andrew Borodin
borodin@octonica.com
In reply to: Tom Lane (#4)
Re: [HACKERS] 答复: GiST API Adancement

2017-06-16 17:06 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:

Yuan Dong <Doffery@hotmail.com> writes:

·¢¼þÈË: Andrew Borodin <borodin@octonica.com>

I think there is one more aspect of development: backward
compatibility: it's impossible to update all existing extensions. This
is not that major feature to ignore them.

I should still maintain original API of GiST after modification.

To put a little context on that: we have always felt that it's okay
to require extension authors to make small adjustments when going to
a new major release.

Then maybe it worth to consider more general API advancement at once?
Here's my wish list:
1. Choose subtree function as an alternative for penalty
2. Bulk load support
3. Better split framework: now many extensions peek two seeds and
spread split candidates among them, some with subtle but noisy
mistakes
4. Better way of key compares (this thread, actually)
5. Split in many portions

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Yuan Dong
Doffery@hotmail.com
In reply to: Andrew Borodin (#5)
答复: [HACKERS] 答复: GiST API Adancement

2017-06-16 17:06 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:

Yuan Dong <Doffery@hotmail.com> writes:

·¢¼þÈË: Andrew Borodin <borodin@octonica.com>

I think there is one more aspect of development: backward
compatibility: it's impossible to update all existing extensions. This
is not that major feature to ignore them.

I should still maintain original API of GiST after modification.

To put a little context on that: we have always felt that it's okay
to require extension authors to make small adjustments when going to
a new major release.

Tom's comment is very helpful, I will keep it in mind.

Then maybe it worth to consider more general API advancement at once?
Here's my wish list:
1. Choose subtree function as an alternative for penalty
2. Bulk load support
3. Better split framework: now many extensions peek two seeds and
spread split candidates among them, some with subtle but noisy
mistakes
4. Better way of key compares (this thread, actually)
5. Split in many portions

Okay, I would try to do more advancements step by step.

I do not understand the problem of choosing subtree as an alternative for penalty. What do yo mean by choosing subtree function?

There will be more questions coming, hope you will be comfortable with that~

Best wishes.

--

Dong

________________________________
发件人: Andrew Borodin <borodin@octonica.com>
发送时间: 2017年6月16日 16:48:04
收件人: Tom Lane
抄送: Yuan Dong; pgsql-hackers@postgresql.org
主题: Re: [HACKERS] 答复: GiST API Adancement

2017-06-16 17:06 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:

Yuan Dong <Doffery@hotmail.com> writes:

·¢¼þÈË: Andrew Borodin <borodin@octonica.com>

I think there is one more aspect of development: backward
compatibility: it's impossible to update all existing extensions. This
is not that major feature to ignore them.

I should still maintain original API of GiST after modification.

To put a little context on that: we have always felt that it's okay
to require extension authors to make small adjustments when going to
a new major release.

Then maybe it worth to consider more general API advancement at once?
Here's my wish list:
1. Choose subtree function as an alternative for penalty
2. Bulk load support
3. Better split framework: now many extensions peek two seeds and
spread split candidates among them, some with subtle but noisy
mistakes
4. Better way of key compares (this thread, actually)
5. Split in many portions

Best regards, Andrey Borodin.