To Do wiki

Started by Jeff Janesalmost 14 years ago6 messages
#1Jeff Janes
jeff.janes@gmail.com

The To Do wiki says not to add things to the page with discussing here.

So here are some things to discuss. Assuming the discussion is a
brief yup or nope, it seems to make sense to lump them into one email:

Vacuuming a table with a large GIN index is painfully slow, because
the index is vacuumed in logical order not physical order. Is making
a vacuum in physical order a to-do? Does this belong to vacuuming, or
to GIN indexing? Looking at the complexity of how this was done for
btree index, I would say this is far from easy. I wonder if there is
an easier way that is still good enough, for example every time you
split a page, check to see if a vacuum is in the index, and if so only
move tuples physically rightward. If the table is so active that
there is essentially always a vacuum in the index, this could lead to
bloat. But if the table is that large and active, under the current
non-physical order the vacuum would likely take approximately forever
to finish and so the bloat would be just as bad under that existing
system.

"Speed up COUNT(*)" is marked as done. While index-only-scans should
speed this up in certain cases, it is nothing compared to the speed up
that could be obtained by "use a fixed row count and a +/- count to
follow MVCC visibility rules", and that speed-up is the one people
used to MyISAM are expecting. We might not want to actually implement
the fixed row count +/- MVCC count idea, but we probably shouldn't
mark the whole thing as done because just one approach to it was
implemented.

sort_support was implemented for plain tuple sorting only, To Do is
extend to index-creation sorts (item 2 from message
<1698.1323222387@sss.pgh.pa.us>)

Cheers,

Jeff

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Janes (#1)
Re: To Do wiki

On 10.04.2012 03:32, Jeff Janes wrote:

The To Do wiki says not to add things to the page with discussing here.

So here are some things to discuss. Assuming the discussion is a
brief yup or nope, it seems to make sense to lump them into one email:

Vacuuming a table with a large GIN index is painfully slow, because
the index is vacuumed in logical order not physical order. Is making
a vacuum in physical order a to-do? Does this belong to vacuuming, or
to GIN indexing? Looking at the complexity of how this was done for
btree index, I would say this is far from easy. I wonder if there is
an easier way that is still good enough, for example every time you
split a page, check to see if a vacuum is in the index, and if so only
move tuples physically rightward. If the table is so active that
there is essentially always a vacuum in the index, this could lead to
bloat. But if the table is that large and active, under the current
non-physical order the vacuum would likely take approximately forever
to finish and so the bloat would be just as bad under that existing
system.

Yup, seems like a todo. It doesn't sound like a good idea to force
tuples to be moved right when a vacuum is in progress, that could lead
to bloating, but it should be feasible to implement the same
cycleid-mechanism in gin that we did in b-tree.

"Speed up COUNT(*)" is marked as done. While index-only-scans should
speed this up in certain cases, it is nothing compared to the speed up
that could be obtained by "use a fixed row count and a +/- count to
follow MVCC visibility rules", and that speed-up is the one people
used to MyISAM are expecting. We might not want to actually implement
the fixed row count +/- MVCC count idea, but we probably shouldn't
mark the whole thing as done because just one approach to it was
implemented.

I think the way we'd speed up COUNT(*) further would be to implement
materialized views. Then you could define a materialized view on
COUNT(*), and essentially get a row counter similar to MyISAM. I think
it's fair to mark this as done.

sort_support was implemented for plain tuple sorting only, To Do is
extend to index-creation sorts (item 2 from message
<1698.1323222387@sss.pgh.pa.us>)

Index-creation sorts are already handled, Tom is referring to using the
new comparator API for index searches in that email. The change would go
to _bt_compare().

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Greg Stark
stark@mit.edu
In reply to: Heikki Linnakangas (#2)
Re: To Do wiki

On Tue, Apr 10, 2012 at 7:27 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I think the way we'd speed up COUNT(*) further would be to implement
materialized views. Then you could define a materialized view on COUNT(*),
and essentially get a row counter similar to MyISAM. I think it's fair to
mark this as done.

If only because it comes up so frequently it would be good to have
this noted in the TODO, either under materialized views or as a
pointer to them.

A good materialized views implementation including automatically
determining what delta data to keep sure would be nice to have.

--
greg

#4Jeff Janes
jeff.janes@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: To Do wiki

On Mon, Apr 9, 2012 at 11:27 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 10.04.2012 03:32, Jeff Janes wrote:

The To Do wiki says not to add things to the page with discussing here.

...

sort_support was implemented for plain tuple sorting only, To Do is
extend to index-creation sorts (item 2 from message
<1698.1323222387@sss.pgh.pa.us>)

Index-creation sorts are already handled, Tom is referring to using the new
comparator API for index searches in that email. The change would go to
_bt_compare().

If I do "select count(distinct bid) from pgbench_accounts" I get many
calls to btint4fastcmp, but if I do "create index on pgbench_accounts
(bid)" I instead get many calls to btint4cmp. If the index build is
using SortSupport, shouldn't it also be calling btint4fastcmp like the
distinct does?

Cheers,

Jeff

#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Janes (#4)
Re: To Do wiki

On 10.04.2012 18:31, Jeff Janes wrote:

On Mon, Apr 9, 2012 at 11:27 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 10.04.2012 03:32, Jeff Janes wrote:

The To Do wiki says not to add things to the page with discussing here.

...

sort_support was implemented for plain tuple sorting only, To Do is
extend to index-creation sorts (item 2 from message
<1698.1323222387@sss.pgh.pa.us>)

Index-creation sorts are already handled, Tom is referring to using the new
comparator API for index searches in that email. The change would go to
_bt_compare().

If I do "select count(distinct bid) from pgbench_accounts" I get many
calls to btint4fastcmp, but if I do "create index on pgbench_accounts
(bid)" I instead get many calls to btint4cmp. If the index build is
using SortSupport, shouldn't it also be calling btint4fastcmp like the
distinct does?

Oh, sorry, you're right. I stand corrected.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#6Peter Geoghegan
peter@2ndquadrant.com
In reply to: Heikki Linnakangas (#5)
Re: To Do wiki

On 10 April 2012 16:40, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 10.04.2012 18:31, Jeff Janes wrote:

If I do "select count(distinct bid) from pgbench_accounts" I get many
calls to btint4fastcmp, but if I do "create index on pgbench_accounts
(bid)" I instead get many calls to btint4cmp.  If the index build is
using SortSupport, shouldn't it also be calling btint4fastcmp like the
distinct does?

Oh, sorry, you're right. I stand corrected.

There is an impedance mismatch between tuplesort_begin_heap and
tuplesort_begin_index_btree that prevented this from being done with
the initial commit. Strangely, the SortSupport commit message didn't
comment on this.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services