WARM and indirect indexes

Started by Alvaro Herreraabout 9 years ago20 messages

alvherre@2ndquadrant.com

about 9 years ago

Two options are on the table to attack the problem of updates causing
write amplification: WARM and indirect indexes. They are completely
different approaches but have overlapping effects on what scenarios are
improved. Here's a recap of both features, with the intent that we make
a well-considered decision about each.

The main effect of both features is that an updated tuple doesn't
require updating indexes that are on unmodified columns. Indirect
indexes are a completely new server feature which may enable other
refinements later on; WARM is a targeted optimization on top of the HOT
optimization.

The big advantage of WARM is that it works automatically, like HOT: the
user doesn't need to do anything different than today to get the
benefit. With indirect indexes, the user needs to create the index as
indirect explicitely.

There are two big disadvantages to WARM (as to HOT): it cannot be
applied when the heap page is full; and concurrent long-running
transactions can spoil reclaimability of recently dead tuples in heap
pages. There's a further disadvantage: currently, there can be only one
WARM update in an update chain. (Pavan believes it is possible to allow
multiple ones.) All those cases can benefit from indirect indexes.

Another interesting case is a table with a primary key and a JSON
object, on which you have a GIN index (or an int[] array, or text
search). What happens if you modify the JSON? With WARM, this is just
a normal index update. With indirect indexes, you may be able to skip
inserting index entries for all the JSON elements except those which
changed. (However, this is not implemented yet.)

- When scanning a WARM-updated block starting from an index, you may
need to do more work to walk the update chain until you find the visible
tuple. Most of the time, HOT/WARM chains are very short thanks to HOT
pruning, so this shouldn't be a problem.

- Indirect indexes require a primary key to be present. If the PK is
dropped, the IndIx must be dropped too.

- Indirect indexes become larger if the primary key is wide.

- Indirect indexes are not fully implemented yet (need to remove
restriction of PK value being 6 bytes; also wholesale vacuuming, though
there's no universal agreement that this is strictly necessary.)

- An indirect index takes longer to read, since it needs to descend both
the IndIx itself and the primary key index.

--
ï¿½lvaro Herrera

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

about 9 years ago

In reply to: Alvaro Herrera (#1)

Re: WARM and indirect indexes

On Tue, Jan 10, 2017 at 04:24:42PM -0300, Alvaro Herrera wrote:

Two options are on the table to attack the problem of updates causing
write amplification: WARM and indirect indexes. They are completely
different approaches but have overlapping effects on what scenarios are
improved. Here's a recap of both features, with the intent that we make
a well-considered decision about each.

The main effect of both features is that an updated tuple doesn't
require updating indexes that are on unmodified columns. Indirect
indexes are a completely new server feature which may enable other
refinements later on; WARM is a targeted optimization on top of the HOT
optimization.

The big advantage of WARM is that it works automatically, like HOT: the
user doesn't need to do anything different than today to get the
benefit. With indirect indexes, the user needs to create the index as
indirect explicitely.

Thank you for the summary. I think we have to consider two things with
indirect indexes:

1. What percentage speedup is the _average_ user going to get? You
have to consider people who will use indirect indexes who get no benefit
or a net slowdown, and users who will get a benefit.

2. What percentage of users are going to use indirect indexes?

So, for #1 you might have users who are getting +1%, +50%, and -20%, so
maybe +10% average, and for #2 you might have 0.1%. When you multiply
them out, you get 0.01% average improvement per installation, which is
very small. Obviously, these are just wild guesses, but this is just to
make a point. If you assume WARM has been optimized, #1 gets even
lower.

I am not saying we shouldn't do it, but I am afraid that the complexity
in figuring out when to use indirect indexes, combined with the number
of users who will try them, really hurts its inclusion.

FYI, we have a similar issue in adding GUC variables, which I outlined
in this blog post:

http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

about 9 years ago

In reply to: Bruce Momjian (#2)

Re: WARM and indirect indexes

Bruce Momjian wrote:

1. What percentage speedup is the _average_ user going to get? You
have to consider people who will use indirect indexes who get no benefit
or a net slowdown, and users who will get a benefit.

2. What percentage of users are going to use indirect indexes?

So, for #1 you might have users who are getting +1%, +50%, and -20%, so
maybe +10% average, and for #2 you might have 0.1%. When you multiply
them out, you get 0.01% average improvement per installation, which is
very small. Obviously, these are just wild guesses, but this is just to
make a point.

Perhaps not many users will require indirect indexes; but for those that
do, the feature might be invaluable. We don't do only things that
benefit everybody -- some features are there to satisfy small
populations ("snapshot too old" is a recent example). We should of
course do, and perhaps even favor doing things that benefit everybody,
but should also do the other things.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

about 9 years ago

In reply to: Alvaro Herrera (#3)

Re: WARM and indirect indexes

On Tue, Jan 10, 2017 at 11:36:24PM -0300, Alvaro Herrera wrote:

Bruce Momjian wrote:

1. What percentage speedup is the _average_ user going to get? You
have to consider people who will use indirect indexes who get no benefit
or a net slowdown, and users who will get a benefit.

2. What percentage of users are going to use indirect indexes?

So, for #1 you might have users who are getting +1%, +50%, and -20%, so
maybe +10% average, and for #2 you might have 0.1%. When you multiply
them out, you get 0.01% average improvement per installation, which is
very small. Obviously, these are just wild guesses, but this is just to
make a point.

Perhaps not many users will require indirect indexes; but for those that
do, the feature might be invaluable. We don't do only things that
benefit everybody -- some features are there to satisfy small
populations ("snapshot too old" is a recent example). We should of
course do, and perhaps even favor doing things that benefit everybody,
but should also do the other things.

I never said "We should do only things that benefit everybody," so why
are you saying that? You are arguing against something I didn't say. I
am trying to make a balanced analysis, and you arguing against an
extreme position.

My point is that anything you add must be weighed against the value it
gives to users who use it, and the percentage of users who will use it.
Against that benefit, you have to look at the cost of exposing that API
to users, code complexity, maintenance, etc.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

about 9 years ago

In reply to: Alvaro Herrera (#1)

Re: WARM and indirect indexes

On Wed, Jan 11, 2017 at 12:54 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Two options are on the table to attack the problem of updates causing
write amplification: WARM and indirect indexes. They are completely
different approaches but have overlapping effects on what scenarios are
improved. Here's a recap of both features, with the intent that we make
a well-considered decision about each.

The main effect of both features is that an updated tuple doesn't
require updating indexes that are on unmodified columns. Indirect
indexes are a completely new server feature which may enable other
refinements later on; WARM is a targeted optimization on top of the HOT
optimization.

The big advantage of WARM is that it works automatically, like HOT: the
user doesn't need to do anything different than today to get the
benefit. With indirect indexes, the user needs to create the index as
indirect explicitely.

There are two big disadvantages to WARM (as to HOT): it cannot be
applied when the heap page is full; and concurrent long-running
transactions can spoil reclaimability of recently dead tuples in heap
pages. There's a further disadvantage: currently, there can be only one
WARM update in an update chain. (Pavan believes it is possible to allow
multiple ones.) All those cases can benefit from indirect indexes.

Another interesting case is a table with a primary key and a JSON
object, on which you have a GIN index (or an int[] array, or text
search). What happens if you modify the JSON? With WARM, this is just
a normal index update. With indirect indexes, you may be able to skip
inserting index entries for all the JSON elements except those which
changed. (However, this is not implemented yet.)

I think both are of use in somewhat different scenarios, probably WARM
will be more effective and can be used in more number of cases to
reduce write amplification. It seems to me that indirect indexes are
generally useful with clustered index where data is stored in the leaf
node of the tree, that doesn't mean it can't be used in other cases as
mentioned by you. If you see the work required by indirect indexes is
justified with respect to its usecase, then I think it will be okay to
get both features as the indirect index can be used in future if
somebody implements clustered index.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Pavan Deolasee

pavan.deolasee@gmail.com

about 9 years ago

In reply to: Bruce Momjian (#2)

Re: WARM and indirect indexes

On Wed, Jan 11, 2017 at 7:55 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Tue, Jan 10, 2017 at 04:24:42PM -0300, Alvaro Herrera wrote:

Two options are on the table to attack the problem of updates causing
write amplification: WARM and indirect indexes. They are completely
different approaches but have overlapping effects on what scenarios are
improved. Here's a recap of both features, with the intent that we make
a well-considered decision about each.

The main effect of both features is that an updated tuple doesn't
require updating indexes that are on unmodified columns. Indirect
indexes are a completely new server feature which may enable other
refinements later on; WARM is a targeted optimization on top of the HOT
optimization.

The big advantage of WARM is that it works automatically, like HOT: the
user doesn't need to do anything different than today to get the
benefit. With indirect indexes, the user needs to create the index as
indirect explicitely.

Thank you for the summary. I think we have to consider two things with
indirect indexes:

1. What percentage speedup is the _average_ user going to get? You
have to consider people who will use indirect indexes who get no benefit
or a net slowdown, and users who will get a benefit.

2. What percentage of users are going to use indirect indexes?

That could also be seen as an advantage to indirect indexes. While I
haven't seen the code, I believe indirect index code will only be hit if
someone actually uses them. So there won't be any overhead for other users
who do not wish to use the feature. WARM on the other hand will be "always
on" feature, even for system tables. That clearly has advantages, both from
usability perspective as well as the fact that the code will be heavily
tested. But if there are cases which get adversely affected by WARM, they
will have to pay the price for larger benefit.

To me, a better strategy is probably to focus on one of the patches, get
that in and then evaluate the second patch, both from complexity as well as
performance given that the first patch may have narrowed the gaps.

I was going to ask if we could implement indirect indexes as a separate
IndexAM. But I re-read this thread and found that you'd in fact done it
that way in the first version but then discarded it for performance
reasons. Is there a merit in evaluating that path for indirect indexes
again?

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Alvaro Herrera

alvherre@2ndquadrant.com

about 9 years ago

In reply to: Pavan Deolasee (#6)

Re: WARM and indirect indexes

Pavan Deolasee wrote:

I was going to ask if we could implement indirect indexes as a separate
IndexAM. But I re-read this thread and found that you'd in fact done it
that way in the first version but then discarded it for performance
reasons. Is there a merit in evaluating that path for indirect indexes
again?

Yeah, that was my first approach, and I got it to work to some extent,
but the design felt wrong. What I wrote was "ibtree", an indirect
version of the btree AM. The performance wasn't any better than the
current one (though neither have been optimized at all), and the code
felt very ugly, probably because it was poking holes into abstraction
layers. I also had to duplicate all pg_amop/pg_amproc catalog entries,
etc.

Doing it as a new capability on top of an existing index AM feels much
more natural and seems to lead to a more reasonable model, all things
considered.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

about 9 years ago

In reply to: Bruce Momjian (#4)

Re: WARM and indirect indexes

Bruce Momjian wrote:

My point is that anything you add must be weighed against the value it
gives to users who use it, and the percentage of users who will use it.
Against that benefit, you have to look at the cost of exposing that API
to users, code complexity, maintenance, etc.

I agree.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

about 9 years ago

In reply to: Bruce Momjian (#2)

Re: WARM and indirect indexes

On 01/10/2017 09:25 PM, Bruce Momjian wrote:

I am not saying we shouldn't do it, but I am afraid that the complexity
in figuring out when to use indirect indexes, combined with the number
of users who will try them, really hurts its inclusion.

I think you're making this out to be far more complex than it really is.
You could argue the same about a great many features. Both of these
features have upsides and downsides.

Obviously we need to get some benchmarks to we can quantify the effects,
but this complexity argument doesn't convince me at all. After all,
nobody has to use indirect indexes.

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Craig Ringer

craig.ringer@2ndquadrant.com

about 9 years ago

In reply to: Andrew Dunstan (#9)

Re: WARM and indirect indexes

On 11 Jan. 2017 21:29, "Andrew Dunstan" <andrew.dunstan@2ndquadrant.com>
wrote:

On 01/10/2017 09:25 PM, Bruce Momjian wrote:

I am not saying we shouldn't do it, but I am afraid that the complexity
in figuring out when to use indirect indexes, combined with the number
of users who will try them, really hurts its inclusion.

I think you're making this out to be far more complex than it really is.
You could argue the same about a great many features. Both of these
features have upsides and downsides.

Obviously we need to get some benchmarks to we can quantify the effects,
but this complexity argument doesn't convince me at all. After all, nobody
has to use indirect indexes.

Another consideration is ... say we decide it didn't work out in the real
world and doesn't see enough use, has needed more maintenance than
expected, etc. Unlikely IMO but allow that for the sake of argument.

Well... this is something we can rip out.

It's not something that'll deeply and permanently change fundamentals of
the on-disk heap representation. It doesn't add new data types we can't
easily remove. It's... just not that intrusive. Especially from a
UI/application PoV.

So our *risk* here isn't that great either. Our commitment doesn't have to
be total. There aren't semantic differences that we will break apps by
undoing if we decided to change how they worked behind the scenes, drop the
idea entirely, etc.

Sure, we'll have them in supported releases for a while even in the VERY
unlikely event that happens. But it's not like there's much code churn
there, much cost to a little used feature.

All that said, I'm far from convinced this will be niche or little used.
Nor does it look hugely intrusive or likely to be a maintenance burden. So
the cost side of the cost/benefit/risk analysis isn't exactly overwhelming.

#11

Bruce Momjian

bruce@momjian.us

about 9 years ago

In reply to: Bruce Momjian (#2)

Re: WARM and indirect indexes

On Tue, Jan 10, 2017 at 09:25:05PM -0500, Bruce Momjian wrote:

Thank you for the summary. I think we have to consider two things with
indirect indexes:

1. What percentage speedup is the _average_ user going to get? You
have to consider people who will use indirect indexes who get no benefit
or a net slowdown, and users who will get a benefit.

2. What percentage of users are going to use indirect indexes?

So, for #1 you might have users who are getting +1%, +50%, and -20%, so
maybe +10% average, and for #2 you might have 0.1%. When you multiply
them out, you get 0.01% average improvement per installation, which is
very small. Obviously, these are just wild guesses, but this is just to
make a point. If you assume WARM has been optimized, #1 gets even
lower.

Sorry to have to reply to my own email but I need some of the text
above.

Basically, with WARM, the adoption rate (#2) is 100%.

I am asking what instructions we will give users for #1 to prevent
people from using indirect indexes and getting worse performance. Are
we going to say, "Use indirect indexes on columns that are updated
frequently?" Actually, that seems pretty clear and would be easy for
users to follow.

I think the big question is that we will not know the benefits of
indirect indexes over WARM until we have implemented WARM, and if the
benefits of indirect indexes over WARM are small, and considering #2, we
might decide that it isn't worth adding it, for the reasons I already
outlined.

Therefore, I think we need WARM done first, then we can test indirect
indexes to see if they are a sufficient win to add it for the small
percentage of users who will use it.

In general, Postgres doesn't support ever possible performance tuning
option, and I think we are better for that because Postgres is simpler
to use. Going back to my blog post, if you add a feature, every user
who is considering tuning Postgres has to understand the feature and
decide if they should use it, so even for people who don't user the
feature, there is a cost, however small.

In summary, I love WARM, and might love indirect indexes too, but I need
to feel that indirect indexes are a clear win for the added complexity,
both in our code, and for the user API.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Alvaro Herrera

alvherre@2ndquadrant.com

about 9 years ago

In reply to: Bruce Momjian (#11)

Re: WARM and indirect indexes

Bruce Momjian wrote:

Therefore, I think we need WARM done first, then we can test indirect
indexes to see if they are a sufficient win to add it for the small
percentage of users who will use it.

Agreed -- that's my plan.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Bruce Momjian

bruce@momjian.us

about 9 years ago

In reply to: Pavan Deolasee (#6)

Re: WARM and indirect indexes

On Wed, Jan 11, 2017 at 12:36:24PM +0530, Pavan Deolasee wrote:

That could also be seen as an advantage to indirect indexes. While I haven't
seen the code, I believe indirect index code will only be hit if someone
actually uses them. So there won't be any overhead for other users who do not
wish to use the feature. WARM on the other hand will be "always on" feature,
even for system tables. That clearly has advantages, both from usability
perspective as well as the fact that the code will be heavily tested. But if
there are cases which get adversely affected by WARM, they will have to pay the
price for larger benefit.

To me, a better strategy is probably to focus on one of the patches, get that
in and then evaluate the second patch, both from complexity as well as
performance given that the first patch may have narrowed the gaps.

Yes, that is exactly what I suggested in a post I just sent to this
thread. With WARM always-on, it seems like the best thing to do first
because everyone will use it silently, and we can then decide if a user
controlled feature is warranted, and under what circumstances we should
recommend the user of the feature.

However, I am concerned that doing the two features serially (not in
parallel) might mean that the second feature doesn't make it into
Postgres 10, but considering we will live with this feature probably
forever, I think it is the best course.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Bruce Momjian

bruce@momjian.us

about 9 years ago

In reply to: Andrew Dunstan (#9)

Re: WARM and indirect indexes

On Wed, Jan 11, 2017 at 08:28:28AM -0500, Andrew Dunstan wrote:

On 01/10/2017 09:25 PM, Bruce Momjian wrote:

I am not saying we shouldn't do it, but I am afraid that the complexity
in figuring out when to use indirect indexes, combined with the number
of users who will try them, really hurts its inclusion.

I think you're making this out to be far more complex than it really is. You
could argue the same about a great many features. Both of these features
have upsides and downsides.

Right, and we do make such arguments --- what is your point?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Bruce Momjian

bruce@momjian.us

about 9 years ago

In reply to: Alvaro Herrera (#12)

Re: WARM and indirect indexes

On Wed, Jan 11, 2017 at 12:24:55PM -0300, Alvaro Herrera wrote:

Bruce Momjian wrote:

Therefore, I think we need WARM done first, then we can test indirect
indexes to see if they are a sufficient win to add it for the small
percentage of users who will use it.

Agreed -- that's my plan.

Thanks. I hate to pour cold water on a feature, honestly, but I don't
want us to over-react to the write amplification problem either. We
don't want to have X features to improve it when a single feature is
sufficient. (Amit might be right that the real win for indirect indexes
will be some type of clustered index, where the win might be larger.)

I know Uber dumped us "allegedly" over this issue (among other
complaints), but I am concerned we are overreacting if we change
Postgres too much to address this concern. Hence, I am arguing we don't
add both features at the same time without evaluating the need for the
second feature after the first feature is done.

Let me give an example of us not over-reacting. When we implemented HOT
(thanks Pavan), we considered the problem that our default fill factor
for heap is 100%, so there is no room for HOT updates on a full page.
Should we reduce the default fill factor when adding HOT? We decided
not to, on the hope that the first update to a row on a full page would
put the new row on a page with sufficient free space for future HOT
updates, and that has proven to be the case.

We do document lower full factors for heap to improve HOT updates, but I
think everyone feels that is mostly for good performance after the
initial table load, and that over time the frequently-updated rows will
naturally migrate to pages with sufficient free space.

My point is that we didn't over-react in that case, and the result was
fine --- read-only rows got dense storage, and frequently-updated rows
got sufficient free space for HOT, but we had to push HOT into
production to confirm we were in good shape. I am thinking we need to
complete WARM to figure out what it doesn't do well in production so we
can fairly evaluate indirect indexes.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Alvaro Herrera (#1)

Re: WARM and indirect indexes

On Tue, Jan 10, 2017 at 2:24 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

The big advantage of WARM is that it works automatically, like HOT: the
user doesn't need to do anything different than today to get the
benefit. With indirect indexes, the user needs to create the index as
indirect explicitely.

However, this cuts both ways. If the WARM implementation has bugs --
either data-corrupting bugs or crash bugs or returns-wrong-answer bugs
or performance-in-corner-cases bugs -- everyone will be exposed to
them. The kinds of things that could go wrong here are in my view
very similar to the sorts of things that went wrong with multixacts.
If indirect indexes turn out to have similar problems, that's sad, but
people can decide not to use them and be no worse off than before.

(This is exactly why parallel query is now provoking periodic griping
rather than howls of agony - it's off by default in 9.6, and even if
you turn it on, the costing defaults are set fairly conservatively,
and even if it goes completely haywire due to some horrible bug, it's
very unlikely to break anything for longer than it takes you to shut
it back off. It's possible that WARM bugs would only mess up your
indexes and not your table data, which is a lot less bad, but things
that touch the on-disk format are near the top of my "this is scary"
list.)

More broadly, I don't share Bruce's negativity about indirect indexes.
My estimate of what needs to be done for them to be really useful is -
I think - higher than your estimate of what needs to be done, but I
think the concept is great. I also think that some of the concepts -
like allowing the tuple pointers to have widths other than 6 byes -
could turn out to be a good foundation for global indexes in the
future. In fact, it might be considerably easier to make an indirect
index span a partitioning hierarchy than it would be to do the same
for a regular index. But regardless of that, the feature is good for
what it offers today.

As to WARM, it scares me a lot. If zero serious data-corrupting bugs
survive to final release, or even if no more than one or two do, and
if it improves performance in general in the way some of the offered
benchmarks indicate, fantastic. But a feature of this type is by its
nature prone to eating your data in varied and subtle ways. Caveat
committer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Pavan Deolasee

pavan.deolasee@gmail.com

almost 9 years ago

In reply to: Robert Haas (#16)

Re: WARM and indirect indexes

On Thu, Jan 12, 2017 at 3:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 10, 2017 at 2:24 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

The big advantage of WARM is that it works automatically, like HOT: the
user doesn't need to do anything different than today to get the
benefit. With indirect indexes, the user needs to create the index as
indirect explicitely.

However, this cuts both ways. If the WARM implementation has bugs --
either data-corrupting bugs or crash bugs or returns-wrong-answer bugs
or performance-in-corner-cases bugs -- everyone will be exposed to
them.

IMHO WARM is way less complicated or intrusive than HOT was. It doesn't
change any of the MVCC mechanics or doesn't change when and how tuples are
marked dead or when and how dead tuples are removed. What it changes is how
tuples are indexed and accessed via index methods. So I believe bugs in
this area can possibility corrupt indexes or return wrong results, which is
bad but may have happened with many other patches we did in recent past.
The other thing the patch changes is how update-chain is maintained. In
order to quickly find the root offset while updating a tuple, we now store
the root offset in the t_ctid field of the last tuple in the chain and use
a separate bit to mark end-of-the-chain (instead of relying of t_ctid =
t_self check). That can lead to problems if chains are not maintained or
followed correctly. These changes are in the first patch of the patch
series and if you've any suggestions on how to improve that or solidify
chain following, please let me know. I was looking for some way to hide
t_ctid field to ensure that the links are only accessed via some standard
API.

I think as a developer of the patch, what I would like to know is what can
we do address concerns raised by you? What kind of tests you would like to
do to get confidence in the patch? What I've done so far is to rely on the
existing tests such as regression, isolation and pgbench. After adding
support for system tables, the code gets exercised even more during
regression tests, which is good. I also performed a few tests where I would
turn sequential scan off and then run "make installcheck" and compare
regression diffs between master and patched code. That helps because the
index access paths are used even more often. I did not find any bugs in
those tests.

My favourite test during HOT development was to run pgbench with large
number of clients and periodically check for data consistency while tests
are running, by comparing sum(tbalance), sum(bbalance) and sum(abalance)
values. I'm yet to do that kind of test with WARM because that would
require a slightly different test setup (more indexes and more update
statements), but I intend to do those tests too. I have also started
writing regression test cases which could lead to some corner cases and
share them for inclusion irrespective of WARM.

Please share your thoughts on what more can be and should be done.

Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#18

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Pavan Deolasee (#17)

Re: WARM and indirect indexes

On Wed, Jan 11, 2017 at 11:09 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

I think as a developer of the patch, what I would like to know is what can
we do address concerns raised by you? What kind of tests you would like to
do to get confidence in the patch?

Well, off the top of my head, I'd say that there are basically four
things that can be done to improve the quality of patches. These
things are, of course, not a secret:

1. Detailed manual testing.
2. Stress testing.
3. Regression testing (pg_regress, isolation, TAP) perhaps driven by
code coverage reports.
4. Detailed code review by highly-qualified individuals.

For a patch of this type, I highly recommend stress testing. Amit and
his colleagues have done a huge amount of stress testing of the hash
index patches and that's turned up quite a few bugs; sometimes the
tests had to run for many hours before any failure was provoked.

But my real point here is not that reviewing or testing the WARM patch
is any different or more difficult than for any other patch. Rather,
the issue is that the stakes are higher. Whatever a committer would
do (or ask to have done) for a patch of ordinary consequence should be
done ten times over for that one, not because it's more likely to have
bugs but because any bugs that it does have will hurt more. There's a
time for committing and moving on and a time for extreme paranoia.
IMHO, this is the latter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 9 years ago

In reply to: Pavan Deolasee (#17)

Re: WARM and indirect indexes

On 1/11/17 8:09 PM, Pavan Deolasee wrote:

The other thing the patch changes is how update-chain is maintained. In
order to quickly find the root offset while updating a tuple, we now
store the root offset in the t_ctid field of the last tuple in the chain
and use a separate bit to mark end-of-the-chain (instead of relying of
t_ctid = t_self check). That can lead to problems if chains are not
maintained or followed correctly. These changes are in the first patch
of the patch series and if you've any suggestions on how to improve that
or solidify chain following, please let me know. I was looking for some
way to hide t_ctid field to ensure that the links are only accessed via
some standard API.

AIUI, that's going to affect every method of heap access except for
index scans that can skip the heap due to being all-visible. That means
the risk here is comparable to the MXID changes.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Bruce Momjian

bruce@momjian.us

almost 9 years ago

In reply to: Robert Haas (#16)

Re: WARM and indirect indexes

On Wed, Jan 11, 2017 at 04:38:10PM -0500, Robert Haas wrote:

More broadly, I don't share Bruce's negativity about indirect indexes.
My estimate of what needs to be done for them to be really useful is -
I think - higher than your estimate of what needs to be done, but I
think the concept is great. I also think that some of the concepts -
like allowing the tuple pointers to have widths other than 6 byes -
could turn out to be a good foundation for global indexes in the
future. In fact, it might be considerably easier to make an indirect
index span a partitioning hierarchy than it would be to do the same
for a regular index. But regardless of that, the feature is good for
what it offers today.

I am worried that indirect indexes might have such limited usefulness
with a well-designed WARM feature that the syntax/feature would be
useless for 99% of users. In talking to Alexander Korotkov, he
mentioned that indirect indexes could be used for global/cross-partition
indexes, and for index-organized tables (heap and index together in a
single file). This would greatly expand the usefulness of indirect
indexes and would be exciting.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers