GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Started by Andrey Borodinalmost 10 years ago36 messageshackers
Jump to latest
#1Andrey Borodin
amborodin@acm.org

Hi, hackers!

I think there is some room for improving GiST inserts. Following is
the description of what I think is performance problem.

Function gistplacetopage in file /src/backend/access/gist/gist.c is
responsible for updating or appending new tuple on GiST page.
Currently, after checking necessity of page split due to overflow, it
essentially executes following:

if (OffsetNumberIsValid(oldoffnum))
PageIndexTupleDelete(page, oldoffnum);
gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);

That is: remove old tuple if it’s there, then place updated tuple at the end.

Half of the old data have to be shifted my memmove inside
PageIndexTupleDelete() call, half of the linp-s have to be corrected.

If the updated tuple has same size as already residing on page we can
just overwrite it. Attached patch demonstrates that concept. Attached
test.sql inserts million rows into GiST index based on cube extension.
My machine is Hyper-V VM running Ubuntu on i5-2500 CPU with SSD
storage. Before patch, insert part of test is executed on average
within 159 second, after patch application: insert part is executed
within 77 seconds on average. That is almost twice faster (for
CPU\Mem-bounded inserts, disk-constrained test will show no
improvement). But it works only for fixed-size tuple inserts.

I know that code in patch is far from beautiful: it operates with
three different levels of abstraction within 5 lines of code. Those
are low level memmove(), system-wide PageAddItem() and GiST private
gistfillBuffer().

By the way PageAddItem() have overwrite flag, but it only works with
unused ItemId’s. Marking old ItemId as unused before PageAddItem()
didn’t work for me. Unfortunately bufpage.c routines do not contain
one for updating(replacing with new) tuple on page. It is important
for me because I’m working on advanced GiST page layout (
/messages/by-id/CAJEAwVE0rrr+OBT-P0gDCtXbVDkBBG_WcXwCBK=GHo4fewu3Yg@mail.gmail.com
), current approach is to use skip-tuples which can be used to skip
many flowing tuples with one key check. Obviously, this design cares
about tuples order. And update in a fashion “place updated tuple at
the end” won’t work for me.

So, I think it would be better to implement PageReplaceItem()
functionality to make code better, to make existing GiST inserts
faster and to enable new advanced page layouts in indices.

Current work is here https://github.com/x4m/pggistopt/tree/simplefix

What do you think about found performance problem and about patch
attached? Am I missing something?

Best regards, Andrey Borodin, Octonica & Ural Federal University.

Attachments:

test.sqlapplication/octet-stream; name=test.sqlDownload
GiST memmove.patchapplication/octet-stream; name="GiST memmove.patch"Download+35-0
#2Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrey Borodin (#1)
Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Hi!

On Sun, Jul 3, 2016 at 12:24 PM, Andrew Borodin <borodin@octonica.com>
wrote:

I think there is some room for improving GiST inserts. Following is
the description of what I think is performance problem.

Function gistplacetopage in file /src/backend/access/gist/gist.c is
responsible for updating or appending new tuple on GiST page.
Currently, after checking necessity of page split due to overflow, it
essentially executes following:

if (OffsetNumberIsValid(oldoffnum))
PageIndexTupleDelete(page, oldoffnum);
gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);

That is: remove old tuple if it’s there, then place updated tuple at the
end.

Half of the old data have to be shifted my memmove inside
PageIndexTupleDelete() call, half of the linp-s have to be corrected.

If the updated tuple has same size as already residing on page we can
just overwrite it. Attached patch demonstrates that concept. Attached
test.sql inserts million rows into GiST index based on cube extension.
My machine is Hyper-V VM running Ubuntu on i5-2500 CPU with SSD
storage. Before patch, insert part of test is executed on average
within 159 second, after patch application: insert part is executed
within 77 seconds on average. That is almost twice faster (for
CPU\Mem-bounded inserts, disk-constrained test will show no
improvement). But it works only for fixed-size tuple inserts.

Very promising results!

I know that code in patch is far from beautiful: it operates with

three different levels of abstraction within 5 lines of code. Those
are low level memmove(), system-wide PageAddItem() and GiST private
gistfillBuffer().

By the way PageAddItem() have overwrite flag, but it only works with
unused ItemId’s. Marking old ItemId as unused before PageAddItem()
didn’t work for me. Unfortunately bufpage.c routines do not contain
one for updating(replacing with new) tuple on page. It is important
for me because I’m working on advanced GiST page layout (

/messages/by-id/CAJEAwVE0rrr+OBT-P0gDCtXbVDkBBG_WcXwCBK=GHo4fewu3Yg@mail.gmail.com
), current approach is to use skip-tuples which can be used to skip
many flowing tuples with one key check. Obviously, this design cares
about tuples order. And update in a fashion “place updated tuple at
the end” won’t work for me.

So, I think it would be better to implement PageReplaceItem()
functionality to make code better, to make existing GiST inserts
faster and to enable new advanced page layouts in indices.

+1 for PageReplaceItem()
Even if item is not the same size, we can move the tail of page once
instead of twice.
I think you should implement PageReplaceItem() version and add it to the
commitfest.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#3Andrey Borodin
amborodin@acm.org
In reply to: Alexander Korotkov (#2)
Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Hi!

I think you should implement PageReplaceItem() version and add it to the commitfest.

Here is the patch.
I've called function PageIndexTupleOverwrite() because it's suitable
only for indices. It works on my tests and performance is the same as
in proof-of-concept (despite some sanity checks copied from
PageIndexTupleDelete).
Next weekend I'll do more testing, and then add it to commitfest.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-03 15:21 GMT+05:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:

Show quoted text

Hi!

On Sun, Jul 3, 2016 at 12:24 PM, Andrew Borodin <borodin@octonica.com>
wrote:

I think there is some room for improving GiST inserts. Following is
the description of what I think is performance problem.

Function gistplacetopage in file /src/backend/access/gist/gist.c is
responsible for updating or appending new tuple on GiST page.
Currently, after checking necessity of page split due to overflow, it
essentially executes following:

if (OffsetNumberIsValid(oldoffnum))
PageIndexTupleDelete(page, oldoffnum);
gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);

That is: remove old tuple if it’s there, then place updated tuple at the
end.

Half of the old data have to be shifted my memmove inside
PageIndexTupleDelete() call, half of the linp-s have to be corrected.

If the updated tuple has same size as already residing on page we can
just overwrite it. Attached patch demonstrates that concept. Attached
test.sql inserts million rows into GiST index based on cube extension.
My machine is Hyper-V VM running Ubuntu on i5-2500 CPU with SSD
storage. Before patch, insert part of test is executed on average
within 159 second, after patch application: insert part is executed
within 77 seconds on average. That is almost twice faster (for
CPU\Mem-bounded inserts, disk-constrained test will show no
improvement). But it works only for fixed-size tuple inserts.

Very promising results!

I know that code in patch is far from beautiful: it operates with
three different levels of abstraction within 5 lines of code. Those
are low level memmove(), system-wide PageAddItem() and GiST private
gistfillBuffer().

By the way PageAddItem() have overwrite flag, but it only works with
unused ItemId’s. Marking old ItemId as unused before PageAddItem()
didn’t work for me. Unfortunately bufpage.c routines do not contain
one for updating(replacing with new) tuple on page. It is important
for me because I’m working on advanced GiST page layout (

/messages/by-id/CAJEAwVE0rrr+OBT-P0gDCtXbVDkBBG_WcXwCBK=GHo4fewu3Yg@mail.gmail.com
), current approach is to use skip-tuples which can be used to skip
many flowing tuples with one key check. Obviously, this design cares
about tuples order. And update in a fashion “place updated tuple at
the end” won’t work for me.

So, I think it would be better to implement PageReplaceItem()
functionality to make code better, to make existing GiST inserts
faster and to enable new advanced page layouts in indices.

+1 for PageReplaceItem()
Even if item is not the same size, we can move the tail of page once instead
of twice.
I think you should implement PageReplaceItem() version and add it to the
commitfest.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

PageIndexTupleOverwrite.patchapplication/octet-stream; name=PageIndexTupleOverwrite.patchDownload+123-3
#4Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#3)
Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Here is a patch with corrections from Alexander Korotkov.
My tests, check and check-world show no problems against Ubuntu LTS 14 x64 VM.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-04 10:41 GMT+05:00 Andrew Borodin <borodin@octonica.com>:

Show quoted text

Hi!

I think you should implement PageReplaceItem() version and add it to the commitfest.

Here is the patch.
I've called function PageIndexTupleOverwrite() because it's suitable
only for indices. It works on my tests and performance is the same as
in proof-of-concept (despite some sanity checks copied from
PageIndexTupleDelete).
Next weekend I'll do more testing, and then add it to commitfest.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-03 15:21 GMT+05:00 Alexander Korotkov <a.korotkov@postgrespro.ru>:

Hi!

On Sun, Jul 3, 2016 at 12:24 PM, Andrew Borodin <borodin@octonica.com>
wrote:

I think there is some room for improving GiST inserts. Following is
the description of what I think is performance problem.

Function gistplacetopage in file /src/backend/access/gist/gist.c is
responsible for updating or appending new tuple on GiST page.
Currently, after checking necessity of page split due to overflow, it
essentially executes following:

if (OffsetNumberIsValid(oldoffnum))
PageIndexTupleDelete(page, oldoffnum);
gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);

That is: remove old tuple if it’s there, then place updated tuple at the
end.

Half of the old data have to be shifted my memmove inside
PageIndexTupleDelete() call, half of the linp-s have to be corrected.

If the updated tuple has same size as already residing on page we can
just overwrite it. Attached patch demonstrates that concept. Attached
test.sql inserts million rows into GiST index based on cube extension.
My machine is Hyper-V VM running Ubuntu on i5-2500 CPU with SSD
storage. Before patch, insert part of test is executed on average
within 159 second, after patch application: insert part is executed
within 77 seconds on average. That is almost twice faster (for
CPU\Mem-bounded inserts, disk-constrained test will show no
improvement). But it works only for fixed-size tuple inserts.

Very promising results!

I know that code in patch is far from beautiful: it operates with
three different levels of abstraction within 5 lines of code. Those
are low level memmove(), system-wide PageAddItem() and GiST private
gistfillBuffer().

By the way PageAddItem() have overwrite flag, but it only works with
unused ItemId’s. Marking old ItemId as unused before PageAddItem()
didn’t work for me. Unfortunately bufpage.c routines do not contain
one for updating(replacing with new) tuple on page. It is important
for me because I’m working on advanced GiST page layout (

/messages/by-id/CAJEAwVE0rrr+OBT-P0gDCtXbVDkBBG_WcXwCBK=GHo4fewu3Yg@mail.gmail.com
), current approach is to use skip-tuples which can be used to skip
many flowing tuples with one key check. Obviously, this design cares
about tuples order. And update in a fashion “place updated tuple at
the end” won’t work for me.

So, I think it would be better to implement PageReplaceItem()
functionality to make code better, to make existing GiST inserts
faster and to enable new advanced page layouts in indices.

+1 for PageReplaceItem()
Even if item is not the same size, we can move the tail of page once instead
of twice.
I think you should implement PageReplaceItem() version and add it to the
commitfest.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

PageIndexTupleOverwrite v2.patchapplication/octet-stream; name="PageIndexTupleOverwrite v2.patch"Download+123-3
#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Andrey Borodin (#4)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

On Mon, Jul 4, 2016 at 4:52 PM, Andrew Borodin <borodin@octonica.com> wrote:

Here is a patch with corrections from Alexander Korotkov.
My tests, check and check-world show no problems against Ubuntu LTS 14 x64 VM.

- PageIndexTupleDelete(page, oldoffnum);
- gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);
+ {
+ /*if we have just one tuple to update we replace it on-place on page*/
+ if(ntup == 1)
+ {
+ PageIndexTupleOverwrite(page,oldoffnum,*itup);
+ }
+ else
+ {
+ /*this corner case is here to support mix calls case (see comment above)*/
+ PageIndexTupleDelete(page, oldoffnum);
+ gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);
+ }

Here, you have changed the way tuple is added on a page, but still the
WAL record is same as before. I think WAL replay should perform the
action in a same way as we have done when original operation is
performed. If not, then it can change the organization of page which
might lead to failure in replay of consecutive WAL records.

+ /*if we have just one tuple to update we replace it on-place on page*/

In comments, we always leave a space both in the beginning and at the
end of a comment. See comments in nearby code.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andrey Borodin
amborodin@acm.org
In reply to: Amit Kapila (#5)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Thank you, Amit.

Currently, if WAL reconstruct page in another order it won't be a problem.
How can I check that it works? Would it be sufficient if I kill 9 backend
and postmaster after commit and it will start normally executing select
queries after restart?

I'll make a patch with missing spaces later. I also found some more missing
spaces in PageIndexTupleOverwrite call and typo in comments ("on-place").

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-04 18:58 GMT+05:00 Amit Kapila <amit.kapila16@gmail.com>:

Show quoted text

On Mon, Jul 4, 2016 at 4:52 PM, Andrew Borodin <borodin@octonica.com>
wrote:

Here is a patch with corrections from Alexander Korotkov.
My tests, check and check-world show no problems against Ubuntu LTS 14

x64 VM.

- PageIndexTupleDelete(page, oldoffnum);
- gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);
+ {
+ /*if we have just one tuple to update we replace it on-place on page*/
+ if(ntup == 1)
+ {
+ PageIndexTupleOverwrite(page,oldoffnum,*itup);
+ }
+ else
+ {
+ /*this corner case is here to support mix calls case (see comment
above)*/
+ PageIndexTupleDelete(page, oldoffnum);
+ gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);
+ }

Here, you have changed the way tuple is added on a page, but still the
WAL record is same as before. I think WAL replay should perform the
action in a same way as we have done when original operation is
performed. If not, then it can change the organization of page which
might lead to failure in replay of consecutive WAL records.

+ /*if we have just one tuple to update we replace it on-place on page*/

In comments, we always leave a space both in the beginning and at the
end of a comment. See comments in nearby code.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrey Borodin (#6)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Andrew Borodin wrote:

Thank you, Amit.

Currently, if WAL reconstruct page in another order it won't be a problem.

We require that replay of WAL leads to identical bytes than the
original. Among other things, this enables use of a tool that verifies
that WAL is working correctly simply by comparing page images. So
please fix it instead of just verifying that this works for GiST.

By the way, BRIN indexes have a need of this operation too. The current
approach is to call PageIndexDeleteNoCompact followed by PageAddItem.
I suppose it would be beneficial to use your new routine there too.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andrey Borodin
amborodin@acm.org
In reply to: Alvaro Herrera (#7)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Thank you, Alvaro.

Yes, now I see. I'll update gistRedoPageUpdateRecord() to work
accordingly with patched gistplacetopage().
Also, i've noticed that PageAddItem uses MAXALIGN() macro to calculate
tuple size. So, PageIndexTupleOverwrite should behave the same.

About PageIndexDeleteNoCompact() in BRIN. I do not see why
PageIndexDeleteNoCompact() is not a good solution instead of
PageIndexTupleOverwrite? Will it mark tuple as unused until next
PageAddItem will use it's place?

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-04 22:16 GMT+05:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

Andrew Borodin wrote:

Thank you, Amit.

Currently, if WAL reconstruct page in another order it won't be a problem.

We require that replay of WAL leads to identical bytes than the
original. Among other things, this enables use of a tool that verifies
that WAL is working correctly simply by comparing page images. So
please fix it instead of just verifying that this works for GiST.

By the way, BRIN indexes have a need of this operation too. The current
approach is to call PageIndexDeleteNoCompact followed by PageAddItem.
I suppose it would be beneficial to use your new routine there too.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#8)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Here is a patch with updated WAL behavior.

I'm not certain about MAXALIGN for size of appended tuple. Currently
if anyone calls PageIndexTupleOverwrite() with unalligned tuple it
will ereport(ERROR).
I suspect that it should allign size instead.

Also I suspect that PageIndexTupleOverwrite() should make a call to
ItemIdSetNormal() to pretend it is generic function and not just a
private part of GiST.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-04 22:59 GMT+05:00 Andrew Borodin <borodin@octonica.com>:

Show quoted text

Thank you, Alvaro.

Yes, now I see. I'll update gistRedoPageUpdateRecord() to work
accordingly with patched gistplacetopage().
Also, i've noticed that PageAddItem uses MAXALIGN() macro to calculate
tuple size. So, PageIndexTupleOverwrite should behave the same.

About PageIndexDeleteNoCompact() in BRIN. I do not see why
PageIndexDeleteNoCompact() is not a good solution instead of
PageIndexTupleOverwrite? Will it mark tuple as unused until next
PageAddItem will use it's place?

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-04 22:16 GMT+05:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

Andrew Borodin wrote:

Thank you, Amit.

Currently, if WAL reconstruct page in another order it won't be a problem.

We require that replay of WAL leads to identical bytes than the
original. Among other things, this enables use of a tool that verifies
that WAL is working correctly simply by comparing page images. So
please fix it instead of just verifying that this works for GiST.

By the way, BRIN indexes have a need of this operation too. The current
approach is to call PageIndexDeleteNoCompact followed by PageAddItem.
I suppose it would be beneficial to use your new routine there too.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

PageIndexTupleOverwrite v3.patchapplication/octet-stream; name="PageIndexTupleOverwrite v3.patch"Download+140-4
#10Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#9)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Here is a new patch. Updates:
1. Uses MAXALIGN to allocate space on page
2. Uses ItemIdSetNormal in case when ItemId was not normal for some
reasons before call
3. Improved comments and var names

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-05 17:54 GMT+05:00 Andrew Borodin <borodin@octonica.com>:

Show quoted text

Here is a patch with updated WAL behavior.

I'm not certain about MAXALIGN for size of appended tuple. Currently
if anyone calls PageIndexTupleOverwrite() with unalligned tuple it
will ereport(ERROR).
I suspect that it should allign size instead.

Also I suspect that PageIndexTupleOverwrite() should make a call to
ItemIdSetNormal() to pretend it is generic function and not just a
private part of GiST.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-04 22:59 GMT+05:00 Andrew Borodin <borodin@octonica.com>:

Thank you, Alvaro.

Yes, now I see. I'll update gistRedoPageUpdateRecord() to work
accordingly with patched gistplacetopage().
Also, i've noticed that PageAddItem uses MAXALIGN() macro to calculate
tuple size. So, PageIndexTupleOverwrite should behave the same.

About PageIndexDeleteNoCompact() in BRIN. I do not see why
PageIndexDeleteNoCompact() is not a good solution instead of
PageIndexTupleOverwrite? Will it mark tuple as unused until next
PageAddItem will use it's place?

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-04 22:16 GMT+05:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

Andrew Borodin wrote:

Thank you, Amit.

Currently, if WAL reconstruct page in another order it won't be a problem.

We require that replay of WAL leads to identical bytes than the
original. Among other things, this enables use of a tool that verifies
that WAL is working correctly simply by comparing page images. So
please fix it instead of just verifying that this works for GiST.

By the way, BRIN indexes have a need of this operation too. The current
approach is to call PageIndexDeleteNoCompact followed by PageAddItem.
I suppose it would be beneficial to use your new routine there too.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

PageIndexTupleOverwrite v4.patchapplication/octet-stream; name="PageIndexTupleOverwrite v4.patch"Download+145-5
#11Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#10)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

One more thing came to my mind:
WAL records done by the GiST before patch will corrupt GiST after
patch if replayed.
Is it a problem? May be it's prevented on some higher level?

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-06 22:11 GMT+05:00 Andrew Borodin <borodin@octonica.com>:

Here is a new patch. Updates:
1. Uses MAXALIGN to allocate space on page
2. Uses ItemIdSetNormal in case when ItemId was not normal for some
reasons before call
3. Improved comments and var names

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-05 17:54 GMT+05:00 Andrew Borodin <borodin@octonica.com>:

Here is a patch with updated WAL behavior.

I'm not certain about MAXALIGN for size of appended tuple. Currently
if anyone calls PageIndexTupleOverwrite() with unalligned tuple it
will ereport(ERROR).
I suspect that it should allign size instead.

Also I suspect that PageIndexTupleOverwrite() should make a call to
ItemIdSetNormal() to pretend it is generic function and not just a
private part of GiST.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-04 22:59 GMT+05:00 Andrew Borodin <borodin@octonica.com>:

Thank you, Alvaro.

Yes, now I see. I'll update gistRedoPageUpdateRecord() to work
accordingly with patched gistplacetopage().
Also, i've noticed that PageAddItem uses MAXALIGN() macro to calculate
tuple size. So, PageIndexTupleOverwrite should behave the same.

About PageIndexDeleteNoCompact() in BRIN. I do not see why
PageIndexDeleteNoCompact() is not a good solution instead of
PageIndexTupleOverwrite? Will it mark tuple as unused until next
PageAddItem will use it's place?

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-04 22:16 GMT+05:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

Andrew Borodin wrote:

Thank you, Amit.

Currently, if WAL reconstruct page in another order it won't be a problem.

We require that replay of WAL leads to identical bytes than the
original. Among other things, this enables use of a tool that verifies
that WAL is working correctly simply by comparing page images. So
please fix it instead of just verifying that this works for GiST.

By the way, BRIN indexes have a need of this operation too. The current
approach is to call PageIndexDeleteNoCompact followed by PageAddItem.
I suppose it would be beneficial to use your new routine there too.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Michael Paquier
michael@paquier.xyz
In reply to: Andrey Borodin (#11)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

On Fri, Jul 8, 2016 at 2:00 PM, Andrew Borodin <borodin@octonica.com> wrote:

(Please top-post on threads, this is annoying)

One more thing came to my mind:
WAL records done by the GiST before patch will corrupt GiST after
patch if replayed.
Is it a problem? May be it's prevented on some higher level?

If a feature changes the shape of WAL records, XLOG_PAGE_MAGIC is
bumped to prevent any problems.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Andrey Borodin
amborodin@acm.org
In reply to: Michael Paquier (#12)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

If a feature changes the shape of WAL records, XLOG_PAGE_MAGIC is bumped to prevent any problems.

I've attached patch with a bump, but, obviously, it'll be irrelevant
after any other commit changing WAL shape.

Here is a patch with updated GiST README.
I haven't found apropriate place to describe PageIndexTupleOverwrite
function in docs, since all it's siblings are described only in the
code.
Code comments describing this function are coherent with others
(PageAddItem, PageIndexTupleDelete).

Best regards, Andrey Borodin, Octonica & Ural Federal University.

Attachments:

XLog-magic-bump.patchapplication/octet-stream; name=XLog-magic-bump.patchDownload+1-2
PageIndexTupleOverwrite v5.patchapplication/octet-stream; name="PageIndexTupleOverwrite v5.patch"Download+153-5
#14Michael Paquier
michael@paquier.xyz
In reply to: Andrey Borodin (#13)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

On Sun, Jul 24, 2016 at 7:18 PM, Andrew Borodin <borodin@octonica.com> wrote:

If a feature changes the shape of WAL records, XLOG_PAGE_MAGIC is bumped to prevent any problems.

I've attached patch with a bump, but, obviously, it'll be irrelevant
after any other commit changing WAL shape.

Usually the committer in charge of reviewing such a patch would bump
it. There is no need for the patch submitter to do so. I should have
been more precise previously, sorry for my twisted words.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#14)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Michael Paquier <michael.paquier@gmail.com> writes:

On Sun, Jul 24, 2016 at 7:18 PM, Andrew Borodin <borodin@octonica.com> wrote:

I've attached patch with a bump, but, obviously, it'll be irrelevant
after any other commit changing WAL shape.

Usually the committer in charge of reviewing such a patch would bump
it. There is no need for the patch submitter to do so. I should have
been more precise previously, sorry for my twisted words.

It's good to remind the committer that such a bump is needed, of course.
But yeah, casting the reminder in the form of a hunk of the patch is
more likely to cause trouble than be helpful.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrey Borodin (#13)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Andrew Borodin wrote:

If a feature changes the shape of WAL records, XLOG_PAGE_MAGIC is bumped to prevent any problems.

I've attached patch with a bump, but, obviously, it'll be irrelevant
after any other commit changing WAL shape.

Here is a patch with updated GiST README.
I haven't found apropriate place to describe PageIndexTupleOverwrite
function in docs, since all it's siblings are described only in the
code.
Code comments describing this function are coherent with others
(PageAddItem, PageIndexTupleDelete).

Can you please patch BRIN to use this new function too?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andrey Borodin
amborodin@acm.org
In reply to: Alvaro Herrera (#16)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Can you please patch BRIN to use this new function too?

AFAIK Sergey Mirvoda was going to implement and test it.
It is easy to do this optimization for brin_samepage_update (see patch
draft in attachment, make check passes), but WAL of regular BRIN
update is not clear for me.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

Attachments:

Brin_PageIndexTupleOverwrite.patchapplication/octet-stream; name=Brin_PageIndexTupleOverwrite.patchDownload+9-15
#18Andrey Borodin
amborodin@acm.org
In reply to: Alvaro Herrera (#16)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Can you please patch BRIN to use this new function too?

On my machine replacement of both BRIN update cases (see
https://github.com/x4m/pggistopt/commit/a6d301ff79104b977619339d53aebf748045418a
) showed no performance changes on folowing update query (6 seconds of
updates avg):
create table dataTable(x int, y int);
insert into dataTable(x,y) select x,y from generate_series(1,1e3,1)
x,generate_series(1,1e3,1) y;
create index idx on dataTable using brin(x,y);
update datatable set x = random()*1024, y = random()*1024;

https://gist.github.com/x4m/7e69fd924b9ffd2fdc9c6100e741171d

Probably I was looking in a wrong place. I do not see other cases when
PageIndexTupleOverwrite can improve performance of BRIN. Though I'll
make PageIndexTupleOverwrite BRIN-compatible in forthcoming patch
version: BRIN tuples have no length in header and it must be taken as
a parameter. Just as the PageAddItem do.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Andrey Borodin
amborodin@acm.org
In reply to: Alvaro Herrera (#16)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

Here is BRIN-compatible version of patch. Now function
PageIndexTupleOverwrite consumes tuple size as a parameter, this
behavior is coherent with PageAddItem.
Also, i had to remove asserstion that item has a storage in the loop
that adjusts line pointers. It would be great if someone could check
that place (commented Assert(ItemIdHasStorage(ii)); in
PageIndexTupleOverwrite). I suspect that there might be a case when
linp's should not be adjusted.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

Attachments:

PageIndexTupleOverwrite v6.patchapplication/octet-stream; name="PageIndexTupleOverwrite v6.patch"Download+158-6
#20Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Andrey Borodin (#19)
Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

30.07.2016 14:00, Andrew Borodin:

Here is BRIN-compatible version of patch. Now function
PageIndexTupleOverwrite consumes tuple size as a parameter, this
behavior is coherent with PageAddItem.
Also, i had to remove asserstion that item has a storage in the loop
that adjusts line pointers. It would be great if someone could check
that place (commented Assert(ItemIdHasStorage(ii)); in
PageIndexTupleOverwrite). I suspect that there might be a case when
linp's should not be adjusted.

Hi, I reviewed the code and I have couple of questions about
following code:

/* may have negative size here if new tuple is larger */
size_diff = oldsize - alignednewsize;
offset = ItemIdGetOffset(tupid);

if (offset < phdr->pd_upper || (offset + size_diff) >
phdr->pd_special ||
offset != MAXALIGN(offset) || size_diff != MAXALIGN(size_diff))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg("corrupted item offset: offset = %u, size = %u",
offset, (unsigned int) size_diff)));

First of all, shouldn't we use MAXALIGN(oldsize) instead of oldsize?
Although, I'm quite sure that it was already aligned somewhere before.

I doubt that the check (size_diff != MAXALIGN(size_diff)) is necessary.
I'd rather add the check: (offset+size_diff < pd_lower).

Besides that moment, the patch seems pretty good.

BTW, I'm very surprised that it improves performance so much.
And also size is reduced significantly. 89MB against 289MB without patch.
Could you explain in details, why does it happen?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Andrey Borodin
amborodin@acm.org
In reply to: Anastasia Lubennikova (#20)
#22Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Andrey Borodin (#21)
#23Andrey Borodin
amborodin@acm.org
In reply to: Anastasia Lubennikova (#22)
#24Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Andrey Borodin (#23)
#25Andrey Borodin
amborodin@acm.org
In reply to: Anastasia Lubennikova (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrey Borodin (#25)
#27Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#27)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#29)
#31Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#30)
#32Andrey Borodin
amborodin@acm.org
In reply to: Tom Lane (#28)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrey Borodin (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#31)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#35)