Deprecating Hash Indexes
As discussed on other threads, Hash Indexes are currently a broken
feature and bring us into disrepute.
I describe them as broken because they are neither crash safe, nor
could they properly be called unlogged indexes (or if so, why just
hash?). And also because if somebody suggested implementing them the
way they are now, they would be told they were insane because silent
data corruption is not something we tolerate anymore. We know why they
are like that, but its time to remove the rest of the historical
research legacy. It's hard to say "We respect your data [except if you
press here]" and be taken seriously.
Suggested actions are
* Put WARNINGs in the docs against the use of hash indexes, backpatch
to 8.3. CREATE INDEX gives no warning currently, though Index Types
does mention a caution.
* Mention in the current docs that hash indexes are likely to be
deprecated completely in future releases. Should anybody ever make
them work, we can change that advice quickly but I don't think we're
going to.
Personally, I would like to see them removed into a contrib module to
allow people to re-add them if they understand the risks. ISTM better
to confiscate all foot-guns before they go off and then re-issue them
to marksmen, at the risk of annoying the people that use them with
full knowledge but that's likely a contentious issue.
Alternate views?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 10/14/2012 09:45 AM, Simon Riggs wrote:
As discussed on other threads, Hash Indexes are currently a broken
feature and bring us into disrepute.I describe them as broken because they are neither crash safe, nor
could they properly be called unlogged indexes (or if so, why just
hash?). And also because if somebody suggested implementing them the
way they are now, they would be told they were insane because silent
data corruption is not something we tolerate anymore. We know why they
are like that, but its time to remove the rest of the historical
research legacy. It's hard to say "We respect your data [except if you
press here]" and be taken seriously.Suggested actions are
* Put WARNINGs in the docs against the use of hash indexes, backpatch
to 8.3. CREATE INDEX gives no warning currently, though Index Types
does mention a caution.* Mention in the current docs that hash indexes are likely to be
deprecated completely in future releases. Should anybody ever make
them work, we can change that advice quickly but I don't think we're
going to.Personally, I would like to see them removed into a contrib module to
allow people to re-add them if they understand the risks. ISTM better
to confiscate all foot-guns before they go off and then re-issue them
to marksmen, at the risk of annoying the people that use them with
full knowledge but that's likely a contentious issue.Alternate views?
I think we'd be better off to start by implementing unlogged indexes
generally. I have a client who is using hash indexes for the performance
gain on a very heavy insert load precisely because they are unlogged,
and who can get away with it because all their index lookup requirements
are equality. They are aware of the consequences of a crash, and can
manage the risk accordingly.
But there is a lot of attraction in the idea of unlogged btree indexes
for example. And the danger of an unlogged index is substantially less
than that of an unlogged table. After all, your data is still there,
quite crash-safe, and you can thus just rebuild the index after a crash.
cheers
andrew
Simon Riggs <simon@2ndQuadrant.com> writes:
As discussed on other threads, Hash Indexes are currently a broken
feature and bring us into disrepute.
Yeah ...
Suggested actions are
I find it curious that you don't even bother to list "add WAL support to
hash indexes" as a possible solution. It's certainly doable, and
probably not even very much more work than something like moving hash
support to a contrib module would be. (Assuming that's even possible,
which I doubt, because the core code relies on hash index opclasses even
if not on hash indexes.)
regards, tom lane
On 14 October 2012 17:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
As discussed on other threads, Hash Indexes are currently a broken
feature and bring us into disrepute.Yeah ...
Suggested actions are
I find it curious that you don't even bother to list "add WAL support to
hash indexes" as a possible solution. It's certainly doable, and
probably not even very much more work than something like moving hash
support to a contrib module would be. (Assuming that's even possible,
which I doubt, because the core code relies on hash index opclasses even
if not on hash indexes.)
It is doable, yes.
I think that's at least 2-3 months total work, much of it low-level
bug fixing and eventual production bug hunting. Its gone for 9.3
already, so the earliest this would see production code is Sept 2014.
It's a task beyond most people's ability and beyond the range of
professionals without funding. I don't see anyone funding a medium-low
importance feature ahead of the higher ones out there, so I don't
expect the situation to change for (more) years.
If you personally have time to spare on performance features, there
are many optimizer tasks more worthy of your attention and more
popular also, so I don't ask for a code sprint on this.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Oct 14, 2012 at 9:45 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
* Put WARNINGs in the docs against the use of hash indexes, backpatch
to 8.3. CREATE INDEX gives no warning currently, though Index Types
does mention a caution.
I'd be in favor of adding such warnings to the documentation if they
are not there already, and possibly even warning on CREATE INDEX ..
USING hash(). I don't think I'd go so far as to say that we should
imply that they'll be removed in a future release. Given how deeply
intertwined they are with the planner, I doubt that that will happen;
and I think there is enough interest in the technology that it's
likely to eventually be fixed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sunday, October 14, 2012 03:45:49 PM Simon Riggs wrote:
As discussed on other threads, Hash Indexes are currently a broken
feature and bring us into disrepute.I describe them as broken because they are neither crash safe, nor
could they properly be called unlogged indexes (or if so, why just
hash?). And also because if somebody suggested implementing them the
way they are now, they would be told they were insane because silent
data corruption is not something we tolerate anymore. We know why they
are like that, but its time to remove the rest of the historical
research legacy. It's hard to say "We respect your data [except if you
press here]" and be taken seriously.Suggested actions are
* Put WARNINGs in the docs against the use of hash indexes, backpatch
to 8.3. CREATE INDEX gives no warning currently, though Index Types
does mention a caution.* Mention in the current docs that hash indexes are likely to be
deprecated completely in future releases. Should anybody ever make
them work, we can change that advice quickly but I don't think we're
going to.Personally, I would like to see them removed into a contrib module to
allow people to re-add them if they understand the risks. ISTM better
to confiscate all foot-guns before they go off and then re-issue them
to marksmen, at the risk of annoying the people that use them with
full knowledge but that's likely a contentious issue.Alternate views?
I vote for at least logging a wal record when a hash index is modified which
uses incomplete actions to set 'indisready = false' in case its replayed. That
should only use a rather minor amount of code and should help users to find
problems faster.
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 15 October 2012 15:13, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Oct 14, 2012 at 9:45 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
* Put WARNINGs in the docs against the use of hash indexes, backpatch
to 8.3. CREATE INDEX gives no warning currently, though Index Types
does mention a caution.I'd be in favor of adding such warnings to the documentation if they
are not there already, and possibly even warning on CREATE INDEX ..
USING hash().
Sounds like a good idea.
I don't think I'd go so far as to say that we should
imply that they'll be removed in a future release. Given how deeply
intertwined they are with the planner, I doubt that that will happen;
and I think there is enough interest in the technology that it's
likely to eventually be fixed.
Hash indexes aren't used in the planner. Hash joins use completely
separate code.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 15 October 2012 15:19, Andres Freund said...
I vote for at least logging a wal record when a hash index is modified which
uses incomplete actions to set 'indisready = false' in case its replayed. That
should only use a rather minor amount of code and should help users to find
problems faster.
Good idea, though might be harder than it first appears.
How do we issue just one of those per checkpoint, to minimise WAL? How
do we make that change with a physical update WAL? Non-transactional
update? During recovery?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 15, 2012 at 12:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
I don't think I'd go so far as to say that we should
imply that they'll be removed in a future release. Given how deeply
intertwined they are with the planner, I doubt that that will happen;
and I think there is enough interest in the technology that it's
likely to eventually be fixed.Hash indexes aren't used in the planner. Hash joins use completely
separate code.
It's not really completely separate, because to do a hash join we have
to find a hash function for the relevant data types, and IIUC we do
that by looking up the default hash opclass for the datatype and
finding its first support function. Of course, if we were to remove
the hash AM, then you couldn't define a hash opclass against it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Monday, October 15, 2012 07:03:35 PM Simon Riggs wrote:
On 15 October 2012 15:19, Andres Freund said...
I vote for at least logging a wal record when a hash index is modified
which uses incomplete actions to set 'indisready = false' in case its
replayed. That should only use a rather minor amount of code and should
help users to find problems faster.Good idea, though might be harder than it first appears.
How do we issue just one of those per checkpoint, to minimise WAL?
I was thinking per checkpoint, per backend in order to not add any new locks.
How do we make that change with a physical update WAL? Non-transactional
update? During recovery?
Thats why I suggested using the incomplete actions/cleanup stuff, so we can do
the change when replay finished. Thats not enough for HS though... Can we get
away with putting a if (RecoveryInProgress()) ereport(...) in there?
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 15 October 2012 18:07, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 15, 2012 at 12:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
I don't think I'd go so far as to say that we should
imply that they'll be removed in a future release. Given how deeply
intertwined they are with the planner, I doubt that that will happen;
and I think there is enough interest in the technology that it's
likely to eventually be fixed.Hash indexes aren't used in the planner. Hash joins use completely
separate code.It's not really completely separate, because to do a hash join we have
to find a hash function for the relevant data types, and IIUC we do
that by looking up the default hash opclass for the datatype and
finding its first support function. Of course, if we were to remove
the hash AM, then you couldn't define a hash opclass against it.
Presumably it defaults to hash_any() but I get the picture.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 15, 2012 at 10:13:24AM -0400, Robert Haas wrote:
On Sun, Oct 14, 2012 at 9:45 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
* Put WARNINGs in the docs against the use of hash indexes, backpatch
to 8.3. CREATE INDEX gives no warning currently, though Index Types
does mention a caution.I'd be in favor of adding such warnings to the documentation if they
are not there already, and possibly even warning on CREATE INDEX ..
USING hash(). I don't think I'd go so far as to say that we should
imply that they'll be removed in a future release. Given how deeply
intertwined they are with the planner, I doubt that that will happen;
and I think there is enough interest in the technology that it's
likely to eventually be fixed.
+1 for adding more warnings but do not deprecate them.
Regards,
Ken
Simon,
* Put WARNINGs in the docs against the use of hash indexes, backpatch
to 8.3. CREATE INDEX gives no warning currently, though Index Types
does mention a caution.
I'd be in favor of a warning on create index.
Also, are hash indexes replicated?
* Mention in the current docs that hash indexes are likely to be
deprecated completely in future releases. Should anybody ever make
them work, we can change that advice quickly but I don't think we're
going to.
I'm not sure that's true, necessarily. The nice thing about work on
hash indexes is that it's potentially rather self-contained, i.e. a good
GSOC project. However ...
Personally, I would like to see them removed into a contrib module to
allow people to re-add them if they understand the risks. ISTM better
to confiscate all foot-guns before they go off and then re-issue them
to marksmen, at the risk of annoying the people that use them with
full knowledge but that's likely a contentious issue.
I would be in favor of moving them to contrib for 9.4. Assuming that
someone can figure out how this interacts with the existing system table
opclasses. Them being in /contrib would also put less pressure on the
next new hacker who decides to take them on as a feature; they can
improve them incrementally without needing to fix 100% of issues in the
first go.
So, +1 with modifications ...
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
On Monday, October 15, 2012 08:14:51 PM Josh Berkus wrote:
* Put WARNINGs in the docs against the use of hash indexes, backpatch
to 8.3. CREATE INDEX gives no warning currently, though Index Types
does mention a caution.I'd be in favor of a warning on create index.
Also, are hash indexes replicated?
No. As they aren't WAL logged they can't be transported via wal based
replication (PITR/HS/SR). Which rather quickly results in a very broken setup,
there has been at least one bug on -bugs because of this and there have been
several people in the irc channel experiencing this.
* Mention in the current docs that hash indexes are likely to be
deprecated completely in future releases. Should anybody ever make
them work, we can change that advice quickly but I don't think we're
going to.I'm not sure that's true, necessarily. The nice thing about work on
hash indexes is that it's potentially rather self-contained, i.e. a good
GSOC project. However ...
While self contained I fear you still need quite a bit more knowledge than
usual students have.
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
I would be in favor of moving them to contrib for 9.4. Assuming that
someone can figure out how this interacts with the existing system table
opclasses. Them being in /contrib would also put less pressure on the
next new hacker who decides to take them on as a feature; they can
improve them incrementally without needing to fix 100% of issues in the
first go.
Is there anything currently in contrib that defines its own WAL
records and replay methods? Are there hooks for doing so?
Cheers,
Jeff
On Monday, October 15, 2012 08:46:40 PM Jeff Janes wrote:
On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
I would be in favor of moving them to contrib for 9.4. Assuming that
someone can figure out how this interacts with the existing system table
opclasses. Them being in /contrib would also put less pressure on the
next new hacker who decides to take them on as a feature; they can
improve them incrementally without needing to fix 100% of issues in the
first go.Is there anything currently in contrib that defines its own WAL
records and replay methods? Are there hooks for doing so?
It's not really possible as rmgr.c declares a const array of resource managers.
A contrib module can't sensibly add itself to that. I think changing this has
been discussed/proposed in the past, but -hackers wasn't convinced...
But then, the idea is to add it to -contrib while no WAL support exists..
Personally I don't see a point in -contrib'ing it. I would rather see it throw
errors in dangerous situations and be done with that.
Regards,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 15 October 2012 19:46, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
I would be in favor of moving them to contrib for 9.4. Assuming that
someone can figure out how this interacts with the existing system table
opclasses. Them being in /contrib would also put less pressure on the
next new hacker who decides to take them on as a feature; they can
improve them incrementally without needing to fix 100% of issues in the
first go.Is there anything currently in contrib that defines its own WAL
records and replay methods? Are there hooks for doing so?
Not to date. Search for rmgr plugins for previous discussions.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 15, 2012 at 11:49 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On Monday, October 15, 2012 08:46:40 PM Jeff Janes wrote:
On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
I would be in favor of moving them to contrib for 9.4. Assuming that
someone can figure out how this interacts with the existing system table
opclasses. Them being in /contrib would also put less pressure on the
next new hacker who decides to take them on as a feature; they can
improve them incrementally without needing to fix 100% of issues in the
first go.Is there anything currently in contrib that defines its own WAL
records and replay methods? Are there hooks for doing so?It's not really possible as rmgr.c declares a const array of resource managers.
A contrib module can't sensibly add itself to that. I think changing this has
been discussed/proposed in the past, but -hackers wasn't convinced...But then, the idea is to add it to -contrib while no WAL support exists..
Which then virtually guarantees that WAL support never will exist, doesn't it?
Personally I don't see a point in -contrib'ing it. I would rather see it throw
errors in dangerous situations and be done with that.
+1
Cheers,
Jeff
On Mon, Oct 15, 2012 at 11:46:40AM -0700, Jeff Janes wrote:
On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
I would be in favor of moving them to contrib for 9.4. Assuming that
someone can figure out how this interacts with the existing system table
opclasses. Them being in /contrib would also put less pressure on the
next new hacker who decides to take them on as a feature; they can
improve them incrementally without needing to fix 100% of issues in the
first go.Is there anything currently in contrib that defines its own WAL
records and replay methods? Are there hooks for doing so?Cheers,
Jeff
That is a good point. Please do not move it to contrib if that will make
it even harder/impossible to add WAL support.
Regards,
Ken
On 15 October 2012 20:04, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Oct 15, 2012 at 11:49 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On Monday, October 15, 2012 08:46:40 PM Jeff Janes wrote:
On Mon, Oct 15, 2012 at 11:14 AM, Josh Berkus <josh@agliodbs.com> wrote:
I would be in favor of moving them to contrib for 9.4. Assuming that
someone can figure out how this interacts with the existing system table
opclasses. Them being in /contrib would also put less pressure on the
next new hacker who decides to take them on as a feature; they can
improve them incrementally without needing to fix 100% of issues in the
first go.Is there anything currently in contrib that defines its own WAL
records and replay methods? Are there hooks for doing so?It's not really possible as rmgr.c declares a const array of resource managers.
A contrib module can't sensibly add itself to that. I think changing this has
been discussed/proposed in the past, but -hackers wasn't convinced...But then, the idea is to add it to -contrib while no WAL support exists..
Which then virtually guarantees that WAL support never will exist, doesn't it?
Personally I don't see a point in -contrib'ing it. I would rather see it throw
errors in dangerous situations and be done with that.+1
All I'm planning to do for now is add docs and the WARNING suggested.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, 2012-10-15 at 11:14 -0700, Josh Berkus wrote:
I'd be in favor of a warning on create index.
Only if you can turn it off, please.
But I don't think a warning is appropriate if the statement does exactly
what the user wanted. The place to point out shortcomings of the
implementation is in the documentation.
On 16 October 2012 04:49, Peter Eisentraut <peter_e@gmx.net> wrote:
On Mon, 2012-10-15 at 11:14 -0700, Josh Berkus wrote:
I'd be in favor of a warning on create index.
Only if you can turn it off, please.
But I don't think a warning is appropriate if the statement does exactly
what the user wanted. The place to point out shortcomings of the
implementation is in the documentation.
Fair comment. I've just added a caution in the docs.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services