A strange GiST error message or fillfactor of GiST build

Started by Kyotaro Horiguchiover 7 years ago18 messageshackers
Jump to latest
#1Kyotaro Horiguchi
horikyota.ntt@gmail.com

Hello.

In the discussion about cube's dimention limit [1], I found that
the error messages looks strange.

/messages/by-id/F0E1A404-A495-4F38-B817-06355B537E88@yandex-team.ru

postgres=# create table y as select cube(array(SELECT random() as a FROM generate_series(1,1000))) from generate_series(1,1e3,1);
SELECT 1000
postgres=# create index on y using gist(cube );
ERROR: index row size 8016 exceeds maximum 8152 for index "y_cube_idx"

This is apparently strange. This is because the message doesn't
count fill factor at the time. It is fixed by passing freespace
to gistSplit() and that allows gistfitpage() to consider
fillfactor as TODO comment within.

After the attached patch applied, the above messages becomes as
follows. (And index can be built being a bit sparse by fill
factor.)

ERROR: index row size 8016 exceeds maximum 7333 for index "y_cube_idx"

I'm not sure why 277807bd9e didn't do that completely so I may be
missing something. Is there any thoughts?

There's another issue that ununderstandable message is issued
when (the root) page cannot store two tuples. I'll send a fix for
that sooner if no one objects to check it separately.

=# create table y as select cube(array(SELECT random() as a FROM generate_series(1,900))) from generate_series(1,1e3,1);
=# create index on y using gist (cube);
ERROR: failed to add item to index page in "y_cube_idx"

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

GiST_consider_fillfactor.patchtext/x-patch; charset=us-asciiDownload+13-11
#2Andrey Borodin
amborodin@acm.org
In reply to: Kyotaro Horiguchi (#1)
Re: A strange GiST error message or fillfactor of GiST build

Hi!

29 авг. 2018 г., в 5:32, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> написал(а):

Hello.

In the discussion about cube's dimention limit [1], I found that
the error messages looks strange.

/messages/by-id/F0E1A404-A495-4F38-B817-06355B537E88@yandex-team.ru

postgres=# create table y as select cube(array(SELECT random() as a FROM generate_series(1,1000))) from generate_series(1,1e3,1);
SELECT 1000
postgres=# create index on y using gist(cube );
ERROR: index row size 8016 exceeds maximum 8152 for index "y_cube_idx"

This is apparently strange. This is because the message doesn't
count fill factor at the time. It is fixed by passing freespace
to gistSplit() and that allows gistfitpage() to consider
fillfactor as TODO comment within.

After the attached patch applied, the above messages becomes as
follows. (And index can be built being a bit sparse by fill
factor.)

We are passing freespace everywhere. Also, we pass GistInsertState, and GistState.
Maybe let's put GistState into GistInsertState, GistState already has free space, and pass just GistInsertState everywhere?

Best regards, Andrey Borodin.

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andrey Borodin (#2)
Re: A strange GiST error message or fillfactor of GiST build

Hello.

At Wed, 29 Aug 2018 10:42:59 -0300, Andrey Borodin <x4mmm@yandex-team.ru> wrote in <6FBE12B2-4F59-4DB9-BDE9-62C8801189A8@yandex-team.ru>

postgres=# create table y as select cube(array(SELECT random() as a FROM generate_series(1,1000))) from generate_series(1,1e3,1);
SELECT 1000
postgres=# create index on y using gist(cube );
ERROR: index row size 8016 exceeds maximum 8152 for index "y_cube_idx"

This is apparently strange. This is because the message doesn't
count fill factor at the time. It is fixed by passing freespace
to gistSplit() and that allows gistfitpage() to consider
fillfactor as TODO comment within.

After the attached patch applied, the above messages becomes as
follows. (And index can be built being a bit sparse by fill
factor.)

We are passing freespace everywhere. Also, we pass GistInsertState, and GistState.
Maybe let's put GistState into GistInsertState, GistState already has free space, and pass just GistInsertState everywhere?

Yeah, I thought something like that first. GISTSTATE doesn't have
freespace size but we could refactor so that all insert-related
routines use GISTInsertState and make GISTBuildState have
it. (patch 1) But this prevents future back-patching so I don't
think this acceptable.

The second patch corresponds to the original patch, which is not
srinked so much.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Refactor-parameter-of-GiST-insertion-routines.patchtext/x-patch; charset=us-asciiDownload+177-190
0002-Fix-error-message-of-gistSplit.patchtext/x-patch; charset=us-asciiDownload+7-8
#4Andrey Borodin
amborodin@acm.org
In reply to: Kyotaro Horiguchi (#3)
Re: A strange GiST error message or fillfactor of GiST build

Hello!

30 авг. 2018 г., в 2:42, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> написал(а):

At Wed, 29 Aug 2018 10:42:59 -0300, Andrey Borodin <x4mmm@yandex-team.ru> wrote in <6FBE12B2-4F59-4DB9-BDE9-62C8801189A8@yandex-team.ru>

We are passing freespace everywhere. Also, we pass GistInsertState, and GistState.
Maybe let's put GistState into GistInsertState, GistState already has free space, and pass just GistInsertState everywhere?

Yeah, I thought something like that first. GISTSTATE doesn't have
freespace size but we could refactor so that all insert-related
routines use GISTInsertState and make GISTBuildState have
it. (patch 1) But this prevents future back-patching so I don't
think this acceptable.

The patch looks good to me. Making code better for future development seem to me more important than backpatching this: error per se is cryptic but not threatening, it will be prevented by cube construction routines limitations.
But that is just my IMHO.

Best regards, Andrey Borodin.

#5Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro Horiguchi (#1)
Re: A strange GiST error message or fillfactor of GiST build

On Wed, Aug 29, 2018 at 4:32 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

After the attached patch applied, the above messages becomes as
follows. (And index can be built being a bit sparse by fill
factor.)

ERROR: index row size 8016 exceeds maximum 7333 for index "y_cube_idx"

I'm not sure why 277807bd9e didn't do that completely so I may be
missing something. Is there any thoughts?

It seems strange to me that we consider respecting the fillfactor to
be more important than letting the operation succeed. I would have
thought that the fillfactor would not apply when placing a tuple into
a completely empty page. The point of the setting is, of course, to
leave some free space available on the page for future tuples, but if
the tuples are big enough that only one fits in a page anyway, that's
pointless.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Alexander Korotkov
aekorotkov@gmail.com
In reply to: Robert Haas (#5)
Re: A strange GiST error message or fillfactor of GiST build

On Sat, Sep 1, 2018 at 6:03 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 29, 2018 at 4:32 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

After the attached patch applied, the above messages becomes as
follows. (And index can be built being a bit sparse by fill
factor.)

ERROR: index row size 8016 exceeds maximum 7333 for index "y_cube_idx"

I'm not sure why 277807bd9e didn't do that completely so I may be
missing something. Is there any thoughts?

It seems strange to me that we consider respecting the fillfactor to
be more important than letting the operation succeed. I would have
thought that the fillfactor would not apply when placing a tuple into
a completely empty page. The point of the setting is, of course, to
leave some free space available on the page for future tuples, but if
the tuples are big enough that only one fits in a page anyway, that's
pointless.

IIRC, I've already wrote that I think we don't need GiST fillfactor
parameter at all. As you pointed, the purpose of fillfactor parameter
is to leave some free space in the pages. That, in turn, allow us to
evade the flood of page splits, which may happen when you start
insertions into freshly build and perfectly packed index. But thats
makes sense only for index building algorithms, which can pack index
pages as tight as possible. Our B-tree build algorithm is one of such
alogirhtms: at first it sorts tuples and then packs them into pages as
tight as required. But GiST is another story: GiST index build in the
pretty same as insertion tuples one by one. Yes, we have some bulk
insert optimization for GiST, but it optimizes only IO and internally
still uses picksplit. So, GiST indexes are never perfectly packed
even with fillfactor = 100. Why should we bother setting lower
fillfactor?

Thus, I would vote for removing GiST fillfactor altogether. Assuming
we can't do this for compatibility reasons, I would vote for setting
default GiST fillfactor to 100, and don't introduce new places where
we take it into account.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#6)
Re: A strange GiST error message or fillfactor of GiST build

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:

Thus, I would vote for removing GiST fillfactor altogether. Assuming
we can't do this for compatibility reasons, I would vote for setting
default GiST fillfactor to 100, and don't introduce new places where
we take it into account.

We probably can't remove the fillfactor storage parameter, both for
backwards compatibility and because I think it's implemented independently
of index type. But there's no backwards-compatibility argument against
simply ignoring it, if we conclude it's a bad idea.

regards, tom lane

#8Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#7)
Re: A strange GiST error message or fillfactor of GiST build

On Sat, Sep 1, 2018 at 9:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:

Thus, I would vote for removing GiST fillfactor altogether. Assuming
we can't do this for compatibility reasons, I would vote for setting
default GiST fillfactor to 100, and don't introduce new places where
we take it into account.

We probably can't remove the fillfactor storage parameter, both for
backwards compatibility and because I think it's implemented independently
of index type. But there's no backwards-compatibility argument against
simply ignoring it, if we conclude it's a bad idea.

That's a good idea. Especially if we take into account that
fillfactor is not a forever bad idea for GIST, it just doesn't look
reasonable for current building algorithm. So, we can decide to
ignore it, but if we would switch to different GiST building
algorithm, which can pack pages tightly, we can take fillfactor into
account again.

I think I need to prove my position about GiST fillfactor with some
experiments first, and then propose a patch.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#9Andrey Borodin
amborodin@acm.org
In reply to: Alexander Korotkov (#8)
Re: A strange GiST error message or fillfactor of GiST build

3 сент. 2018 г., в 20:16, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):

That's a good idea. Especially if we take into account that
fillfactor is not a forever bad idea for GIST, it just doesn't look
reasonable for current building algorithm. So, we can decide to
ignore it, but if we would switch to different GiST building
algorithm, which can pack pages tightly, we can take fillfactor into
account again.

BTW, I think that build algorithm may be provided by opclass.

I think I need to prove my position about GiST fillfactor with some
experiments first, and then propose a patch.

FWIW fillfactor can be a cheap emulator for intrapage indexing, when you have like a lot of RAM. Though I didn't tested that.
Also I believe proper intrapage indexing is better :)

Best regards, Andrey Borodin.

#10Alexander Korotkov
aekorotkov@gmail.com
In reply to: Andrey Borodin (#9)
Re: A strange GiST error message or fillfactor of GiST build

On Mon, Sep 3, 2018 at 9:09 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

3 сент. 2018 г., в 20:16, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
That's a good idea. Especially if we take into account that
fillfactor is not a forever bad idea for GIST, it just doesn't look
reasonable for current building algorithm. So, we can decide to
ignore it, but if we would switch to different GiST building
algorithm, which can pack pages tightly, we can take fillfactor into
account again.

BTW, I think that build algorithm may be provided by opclass.

I don't think that providing the whole building algorithm by opclass
is a good idea. But we could rather make opclass provide some
essential routines for build algorithm. For instance, we may
implement sorting-based build algorithm for GiST (like one we have for
B-tree). It wouldn't work for all the opclass'es, but definitely
should work for some of them. Geometrical opclass'es may sort values
by some kind of space-filling curve. In this case, opclass can define
a comparison function.

I think I need to prove my position about GiST fillfactor with some
experiments first, and then propose a patch.

FWIW fillfactor can be a cheap emulator for intrapage indexing, when you have like a lot of RAM. Though I didn't tested that.
Also I believe proper intrapage indexing is better :)

It might be somewhat useful side effect for developers. But it's
definitely doesn't look like a feature we should encourage users to
use.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#11Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Alexander Korotkov (#10)
Re: A strange GiST error message or fillfactor of GiST build

At Mon, 3 Sep 2018 23:05:23 +0300, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote in <CAPpHfdsruk4eEkVTUFTHV7tg9RXQHqvOvBY18p-5pBNqp+PU6w@mail.gmail.com>

On Mon, Sep 3, 2018 at 9:09 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

3 сент. 2018 г., в 20:16, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
That's a good idea. Especially if we take into account that
fillfactor is not a forever bad idea for GIST, it just doesn't look
reasonable for current building algorithm. So, we can decide to
ignore it, but if we would switch to different GiST building
algorithm, which can pack pages tightly, we can take fillfactor into
account again.

BTW, I think that build algorithm may be provided by opclass.

I don't think that providing the whole building algorithm by opclass
is a good idea. But we could rather make opclass provide some
essential routines for build algorithm. For instance, we may
implement sorting-based build algorithm for GiST (like one we have for
B-tree). It wouldn't work for all the opclass'es, but definitely
should work for some of them. Geometrical opclass'es may sort values
by some kind of space-filling curve. In this case, opclass can define
a comparison function.

I think I need to prove my position about GiST fillfactor with some
experiments first, and then propose a patch.

FWIW fillfactor can be a cheap emulator for intrapage indexing, when you have like a lot of RAM. Though I didn't tested that.
Also I believe proper intrapage indexing is better :)

It might be somewhat useful side effect for developers. But it's
definitely doesn't look like a feature we should encourage users to
use.

I agree that fillfactor should be ignored in certain cases
(inserting the first tuple into empty pages or something like
that). Even though GiST doesn't need fillfactor at all, it is
defined independently from AM instances and it gives some
(undocumented) convenient on testing even on GiST.

So, does the following makes sense?

- Preserve fillfactor on GiST but set its default to 100%.

- Ignore fillfactor iff the first tuple for the new page fail
with it but succeed without it.

- Modify GiST internal routines to bring around GISTInsertState
so that they can see fillfactor or any other parameters without
further modification.

/messages/by-id/20180830.144209.208080135.horiguchi.kyotaro@lab.ntt.co.jp

# A storm (literally) is coming behind...

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12Alexander Korotkov
aekorotkov@gmail.com
In reply to: Kyotaro Horiguchi (#11)
Re: A strange GiST error message or fillfactor of GiST build

On Tue, Sep 4, 2018 at 12:16 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I agree that fillfactor should be ignored in certain cases
(inserting the first tuple into empty pages or something like
that). Even though GiST doesn't need fillfactor at all, it is
defined independently from AM instances

What exactly do you mean? Yes, fillfactor is defined in StdRdOptions
struct, which is shared across many access methods. But some of them
uses fillfactor, while others don't. For instance, GIN and BRIN don't
support fillfactor at all.

and it gives some
(undocumented) convenient on testing even on GiST.

Do you keep in the mind some particular use cases? If so, could you
please share them. It would let me and others understand what
behavior we need to preserve and why.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#13Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Alexander Korotkov (#12)
Re: A strange GiST error message or fillfactor of GiST build

At Tue, 4 Sep 2018 13:05:55 +0300, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote in <CAPpHfdsJwptVsRnBtTdtj+G+gTTfSrqdH1uwLNkvQ-72A9scgw@mail.gmail.com>

On Tue, Sep 4, 2018 at 12:16 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

I agree that fillfactor should be ignored in certain cases
(inserting the first tuple into empty pages or something like
that). Even though GiST doesn't need fillfactor at all, it is
defined independently from AM instances

What exactly do you mean? Yes, fillfactor is defined in StdRdOptions
struct, which is shared across many access methods. But some of them
uses fillfactor, while others don't. For instance, GIN and BRIN don't
support fillfactor at all.

Yes. It was with the intent of keeping Andrey's use case. So I
proposed that just disabling it rather than removing the code at
all.

and it gives some
(undocumented) convenient on testing even on GiST.

Do you keep in the mind some particular use cases? If so, could you
please share them. It would let me and others understand what
behavior we need to preserve and why.

I misunderstood the consensus here, I myself don't have a proper
one. The consensus here is:

fillfactor is useless for GiST.
We don't preserve it for Andrey's use case.
No one is objecting to ignoring it here.

Is it right?

Well, I propose the first attached instaed. It just removes
fillfactor handling from GiST.

Apart from the fillfactor removal, we have an internal error for
rootsplit failure caused by too-long tuples, but this should be a
user error message like gistSplit. (index row size %zu exeeds..)

=# create table y as select cube(array(SELECT random() as a FROM generate_series(1,600))) from generate_series(1,1e3,1);
=# create index on y using gist(cube);
ERROR: failed to add item to index page in "y_cube_idx"

The attached second patch changes the message in the case.

ERROR: size of 2 index rows (9640 bytes) exceeds maximum 8152 of the root page for the index "y_cube_idx"

This is one of my first motivation here but now this might be
another issue.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v1-0001-Ignore-fillfactor-in-GiST.patchtext/x-patch; charset=us-asciiDownload+23-38
v1-0002-More-friendly-error-message-for-rootsplit-failure-of.patchtext/x-patch; charset=us-asciiDownload+27-1
#14Andrey Borodin
amborodin@acm.org
In reply to: Kyotaro Horiguchi (#13)
Re: A strange GiST error message or fillfactor of GiST build

5 сент. 2018 г., в 13:29, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> написал(а):

We don't preserve it for Andrey's use case.

Just my 2 cents: that was a hacky use case for development reasons. I think that removing fillfactor is good idea and your latest patch looks good from my POV.

Best regards, Andrey Borodin.

#15Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andrey Borodin (#14)
Re: A strange GiST error message or fillfactor of GiST build

Hello.

Just my 2 cents: that was a hacky use case for development reasons.

I know. So I intended to preserve the parameter with default of 100% if
no one strongly objects to preserve. Then I abandoned that since I had:p

I think that removing fillfactor is good idea and your latest patch
looks good from my POV.

Thanks.

reagrds.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#16Bruce Momjian
bruce@momjian.us
In reply to: Kyotaro Horiguchi (#15)
Re: A strange GiST error message or fillfactor of GiST build

On Thu, Sep 6, 2018 at 04:16:45PM +0900, Kyotaro HORIGUCHI wrote:

Hello.

Just my 2 cents: that was a hacky use case for development reasons.

I know. So I intended to preserve the parameter with default of 100% if no
one strongly objects to preserve. Then I abandoned that since I had:p

So, are we going to output a notice if a non-100% fill factor is
specified?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#16)
Re: A strange GiST error message or fillfactor of GiST build

Bruce Momjian <bruce@momjian.us> writes:

So, are we going to output a notice if a non-100% fill factor is
specified?

I would not think that's helpful.

regards, tom lane

#18Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tom Lane (#17)
Re: A strange GiST error message or fillfactor of GiST build

Hello.

At Fri, 07 Sep 2018 16:12:29 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <19695.1536351149@sss.pgh.pa.us>

Bruce Momjian <bruce@momjian.us> writes:

So, are we going to output a notice if a non-100% fill factor is
specified?

I would not think that's helpful.

I agree. My understanding is that:

It has'nt been worked as expected anyway thus we ignore it
instead of removing not to break existing setup, and with silence
not to surprise someone using it believing it works.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center