[PATCH] cleanup hashindex for pg_migrator hashindex compat mode (for 8.4)
Attached patch cleanups hash index headers to allow compile hasham for
8.3 version. It helps to improve pg_migrator with capability to migrate
database with hash index without reindexing.
I discussed this patch year ago with Alvaro when we tried to cleanup
include bloating problem. It should reduce also number of including.
The main point is that hash functions for datatypes are now in related
data files in utils/adt directory. hash_any() and hash_uint32 it now in
utils/hashfunc.c.
It would be nice to have this in 8.4 because it allows to test index
migration functionality.
Thanks Zdenek
Attachments:
hash.patchtext/x-patch; CHARSET=US-ASCII; name=hash.patchDownload+577-568
I forgot to fix contrib. Updated patch attached.
Zdenek
Zdenek Kotala píše v pá 22. 05. 2009 v 16:23 -0400:
Show quoted text
Attached patch cleanups hash index headers to allow compile hasham for
8.3 version. It helps to improve pg_migrator with capability to migrate
database with hash index without reindexing.I discussed this patch year ago with Alvaro when we tried to cleanup
include bloating problem. It should reduce also number of including.The main point is that hash functions for datatypes are now in related
data files in utils/adt directory. hash_any() and hash_uint32 it now in
utils/hashfunc.c.It would be nice to have this in 8.4 because it allows to test index
migration functionality.Thanks Zdenek
Attachments:
hash_02.patch.gzapplication/x-gzip; name=hash_02.patch.gzDownload
On Sat, May 23, 2009 at 02:52:49PM -0400, Zdenek Kotala wrote:
I forgot to fix contrib. Updated patch attached.
Zdenek
Zdenek Kotala p????e v p?? 22. 05. 2009 v 16:23 -0400:
Attached patch cleanups hash index headers to allow compile hasham for
8.3 version. It helps to improve pg_migrator with capability to migrate
database with hash index without reindexing.I discussed this patch year ago with Alvaro when we tried to cleanup
include bloating problem. It should reduce also number of including.The main point is that hash functions for datatypes are now in related
data files in utils/adt directory. hash_any() and hash_uint32 it now in
utils/hashfunc.c.It would be nice to have this in 8.4 because it allows to test index
migration functionality.Thanks Zdenek
How does that work with the updated hash functions without a reindex?
Regards,
Ken
Kenneth Marshall <ktm@rice.edu> writes:
On Sat, May 23, 2009 at 02:52:49PM -0400, Zdenek Kotala wrote:
Attached patch cleanups hash index headers to allow compile hasham for
8.3 version. It helps to improve pg_migrator with capability to migrate
database with hash index without reindexing.
How does that work with the updated hash functions without a reindex?
I looked at this patch and I don't see how it helps pg_migrator at all.
It's just pushing some code and function declarations around.
The rearrangement might be marginally nicer from a code beautification
point of view --- right now we're a bit inconsistent about whether
datatype-specific hash functions live in hashfunc.c or in the datatype's
utils/adt/ file. But I'm not sure that removing hashfunc.c altogether is
an appropriate solution to that, not least because of the loss of CVS
history for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.
What does seem useful is to refactor the headers so that datatype hash
functions don't need to include all of the AM's implementation details.
But this patch seems to do both more and less than that --- I don't
think it's gotten rid of all external #includes of access/hash.h, and
in any case moving the function code is not necessary to that goal.
In any case, the barriers to implementing 8.3-style hash indexes in 8.4
are pretty huge: you'd need to duplicate not only the hash AM code, but
also all the hash functions, and therefore all of the hash pg_amop and
pg_amproc entries. Given the close-to-zero usefulness of hash indexes
in production installations, I don't think it's worth the trouble. It
would be much more helpful to look into supporting 8.3-compatible GIN
indexes.
regards, tom lane
Hi,
Tom Lane <tgl@sss.pgh.pa.us> writes:
The rearrangement might be marginally nicer from a code beautification
point of view --- right now we're a bit inconsistent about whether
datatype-specific hash functions live in hashfunc.c or in the datatype's
utils/adt/ file. But I'm not sure that removing hashfunc.c altogether is
an appropriate solution to that, not least because of the loss of CVS
history for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.
I guess someone has to talk about it: git will follow the code even when
the file hosting it changes. It's not all magic though:
http://kerneltrap.org/node/11765
"And when using git, the whole 'keep code movement separate from
changes' has an even more fundamental reason: git can track code
movement (again, whether moving a whole file or just a function
between files), and doing a 'git blame -C' will actually follow code
movement between files. It does that by similarity analysis, but it
does mean that if you both move the code *and* change it at the same
time, git cannot see that 'oh, that function came originally from that
other file', and now you get worse annotations about where code
actually originated."
Having better tools maybe could help maintain the high quality standards
that are established code wise, too.
Regards,
--
dim
Dimitri Fontaine <dfontaine@hi-media.com> writes:
Tom Lane <tgl@sss.pgh.pa.us> writes:
The rearrangement might be marginally nicer from a code beautification
point of view --- right now we're a bit inconsistent about whether
datatype-specific hash functions live in hashfunc.c or in the datatype's
utils/adt/ file. But I'm not sure that removing hashfunc.c altogether is
an appropriate solution to that, not least because of the loss of CVS
history for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.
I guess someone has to talk about it: git will follow the code even when
the file hosting it changes.
That might possibly be relevant a year from now; it is 100% irrelevant
to a change being proposed for 8.4.
regards, tom lane
On Mon, May 25, 2009 at 09:59:14AM -0400, Tom Lane wrote:
Dimitri Fontaine <dfontaine@hi-media.com> writes:
Tom Lane <tgl@sss.pgh.pa.us> writes:
The rearrangement might be marginally nicer from a code
beautification point of view --- right now we're a bit
inconsistent about whether datatype-specific hash functions live
in hashfunc.c or in the datatype's utils/adt/ file. But I'm not
sure that removing hashfunc.c altogether is an appropriate
solution to that, not least because of the loss of CVS history
for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.I guess someone has to talk about it: git will follow the code
even when the file hosting it changes.That might possibly be relevant a year from now; it is 100%
irrelevant to a change being proposed for 8.4.
It's pretty relevant as far as the schedule goes. I'm not alone
thinking that the appropriate place to make this change, given
buildfarm support, is at the transition to 8.5.
CVS is dead. Long live git! :)
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Tom Lane píše v ne 24. 05. 2009 v 18:46 -0400:
Kenneth Marshall <ktm@rice.edu> writes:
On Sat, May 23, 2009 at 02:52:49PM -0400, Zdenek Kotala wrote:
Attached patch cleanups hash index headers to allow compile hasham for
8.3 version. It helps to improve pg_migrator with capability to migrate
database with hash index without reindexing.How does that work with the updated hash functions without a reindex?
I looked at this patch and I don't see how it helps pg_migrator at all.
It's just pushing some code and function declarations around.
The main important thing is move hash_any/hash_uint32 function from hash
am. It should reduce amount of duplicated code necessary for old hash
index implementation. Rest is only rearrangement and cleanup.
The rearrangement might be marginally nicer from a code beautification
point of view --- right now we're a bit inconsistent about whether
datatype-specific hash functions live in hashfunc.c or in the datatype's
utils/adt/ file.
I personally prefer to keep it in type definition. AM should be
independent on types which are installed and data type code should be
self contained.
But I'm not sure that removing hashfunc.c altogether is
an appropriatera solution to that, not least because of the loss of CVS
history for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.
Until we will have better version control system, hashfunc.c can stay
here, but what I need is hashfunc.h. Minimalistic version of this patch
is to create hashfuct.h and modified related #include in C and header
files.
What does seem useful is to refactor the headers so that datatype hash
functions don't need to include all of the AM's implementation details.
But this patch seems to do both more and less than that --- I don't
think it's gotten rid of all external #includes of access/hash.h, and
in any case moving the function code is not necessary to that goal.
Agree, I will prepare minimalistic version with hashfunc.h only.
In any case, the barriers to implementing 8.3-style hash indexes in 8.4
are pretty huge: you'd need to duplicate not only the hash AM code, but
also all the hash functions, and therefore all of the hash pg_amop and
pg_amproc entries.
I'm not sure if I need duplicate functions. Generally yes but It seems
to me that hash index does not changed functions behavior and they could
be shared at this moment.
Given the close-to-zero usefulness of hash indexes
in production installations, I don't think it's worth the trouble. It
would be much more helpful to look into supporting 8.3-compatible GIN
indexes.
Agree, I wanted to quickly verify function naming collision problem and
HASH index seems to me better candidate for this basic test. GIN has
priority.
Zdenek
David,
* David Fetter (david@fetter.org) wrote:
It's pretty relevant as far as the schedule goes. I'm not alone
thinking that the appropriate place to make this change, given
buildfarm support, is at the transition to 8.5.CVS is dead. Long live git! :)
I'm all for moving to git, but not until at least the core folks are
more familiar with it and have been using it. I don't believe that
experience will be there by the time we open for 8.5 and a forced march
when we have numerous big things hopefully hitting on the first
commitfest seems like a bad idea.
I would encourage core, committers and contributors to start becoming
familiar with git on the expectation that we'll be making that move
when we open for 8.6/9.0.
Ideally, there could be an official decision made about when it's going
to happen followed by an announcment when 8.4 is released.
Thoughts?
Stephen
On Mon, May 25, 2009 at 11:45:33AM -0400, Stephen Frost wrote:
David,
* David Fetter (david@fetter.org) wrote:
It's pretty relevant as far as the schedule goes. I'm not alone
thinking that the appropriate place to make this change, given
buildfarm support, is at the transition to 8.5.CVS is dead. Long live git! :)
I'm all for moving to git, but not until at least the core folks are
more familiar with it and have been using it.
Which ones aren't familiar and haven't been using it for at least the
past year? I count two.
I don't believe that experience will be there by the time we open
for 8.5 and a forced march when we have numerous big things
hopefully hitting on the first commitfest seems like a bad idea.
Your portrayal of a rough and complicated transition is not terribly
well supported by other projects' switches to git.
I would encourage core, committers and contributors to start
becoming familiar with git on the expectation that we'll be making
that move when we open for 8.6/9.0.Ideally, there could be an official decision made about when it's
going to happen followed by an announcment when 8.4 is released.Thoughts?
Here's mine: Git delayed is git denied.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter wrote:
On Mon, May 25, 2009 at 09:59:14AM -0400, Tom Lane wrote:
Dimitri Fontaine <dfontaine@hi-media.com> writes:
Tom Lane <tgl@sss.pgh.pa.us> writes:
The rearrangement might be marginally nicer from a code
beautification point of view --- right now we're a bit
inconsistent about whether datatype-specific hash functions live
in hashfunc.c or in the datatype's utils/adt/ file. But I'm not
sure that removing hashfunc.c altogether is an appropriate
solution to that, not least because of the loss of CVS history
for the functions. I'd be inclined to leave the core hash_any()
code where it is, if not all of these functions altogether.I guess someone has to talk about it: git will follow the code
even when the file hosting it changes.That might possibly be relevant a year from now; it is 100%
irrelevant to a change being proposed for 8.4.It's pretty relevant as far as the schedule goes. I'm not alone
thinking that the appropriate place to make this change, given
buildfarm support, is at the transition to 8.5.CVS is dead. Long live git! :)
That still misses Tom's point, since the change is proposed for 8.4 and
at the earliest we would not change SCCMs until after 8.4 is released
(and, notwithstanding your eagerness, I suspect it will be rather later).
cheers
andrew
David Fetter <david@fetter.org> writes:
On Mon, May 25, 2009 at 11:45:33AM -0400, Stephen Frost wrote:
I'm all for moving to git, but not until at least the core folks are
more familiar with it and have been using it.
Which ones aren't familiar and haven't been using it for at least the
past year? I count two.
I'm not familiar with it, and neither is Bruce, and frankly that's
entirely sufficient reason not to change now.
What was more or less agreed to at the developer's meeting was that
we would move towards git in an orderly fashion. I'm thinking something
like six months to a year before cutting over the core repository.
If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?
regards, tom lane
On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:
David Fetter <david@fetter.org> writes:
On Mon, May 25, 2009 at 11:45:33AM -0400, Stephen Frost wrote:
I'm all for moving to git, but not until at least the core folks are
more familiar with it and have been using it.Which ones aren't familiar and haven't been using it for at least
the past year? I count two.I'm not familiar with it, and neither is Bruce, and frankly that's
entirely sufficient reason not to change now.What was more or less agreed to at the developer's meeting was that
we would move towards git in an orderly fashion.
The rest have already been moving to it in "an orderly fashion," some
for over than a year.
I'm thinking something like six months to a year before cutting over
the core repository.
What would gate that?
If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?
I've been pestering them :)
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:
If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?
It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
Tom Lane píše v ne 24. 05. 2009 v 18:46 -0400:
In any case, the barriers to implementing 8.3-style hash indexes in 8.4
are pretty huge: you'd need to duplicate not only the hash AM code, but
also all the hash functions, and therefore all of the hash pg_amop and
pg_amproc entries.
I'm not sure if I need duplicate functions. Generally yes but It seems
to me that hash index does not changed functions behavior and they could
be shared at this moment.
No, the behavior of the hash functions themselves changed during 8.4.
Twice, even:
2008-04-06 12:54 tgl
* contrib/dblink/expected/dblink.out,
contrib/dblink/sql/dblink.sql, src/backend/access/hash/hashfunc.c,
src/include/catalog/catversion.h,
src/test/regress/expected/portals.out,
src/test/regress/sql/portals.sql: Improve hash_any() to use
word-wide fetches when hashing suitably aligned data. This makes
for a significant speedup at the cost that the results now vary
between little-endian and big-endian machines; which forces us to
add explicit ORDER BYs in a couple of regression tests to preserve
machine-independent comparison results. Also, force initdb by
bumping catversion, since the contents of hash indexes will change
(at least on big-endian machines).
Kenneth Marshall and Tom Lane, based on work from Bob Jenkins.
This commit does not adopt Bob's new faster mix() algorithm,
however, since we still need to convince ourselves that that
doesn't degrade the quality of the hashing.
2009-02-09 16:18 tgl
* src/: backend/access/hash/hashfunc.c,
include/catalog/catversion.h,
test/regress/expected/polymorphism.out,
test/regress/expected/union.out, test/regress/sql/polymorphism.sql:
Adopt Bob Jenkins' improved hash function for hash_any(). This
changes the contents of hash indexes (again), so bump catversion.
Kenneth Marshall
So as far as I can see, you need completely separate copies of both
hash_any() and the SQL-level functions that call it. I'm not really
seeing that the proposed refactoring makes this any easier. You might
as well just copy-and-paste all that old code into a separate set of
files, and not worry about what is in access/hash.h.
regards, tom lane
David Fetter wrote:
On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:
If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)
Can you be more specific on the necessary contents of such file?
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
David Fetter <david@fetter.org> writes:
On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:
If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?
It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)
Cool, let's see one.
If we were to put it into a repository config file, that would more or
less have the effect of enforcing a project style for diffs, no?
regards, tom lane
On 05/25/2009 07:20 PM, Alvaro Herrera wrote:
David Fetter wrote:
On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:
If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)Can you be more specific on the necessary contents of such file?
A very sketchy notion of it is at:
http://wiki.postgresql.org/wiki/Talk:Working_with_Git
I will try to correct the wording + windows information after eating.
Andres
On 05/25/2009 07:31 PM, Tom Lane wrote:
David Fetter<david@fetter.org> writes:
On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:
If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)Cool, let's see one.
If we were to put it into a repository config file, that would more or
less have the effect of enforcing a project style for diffs, no?
Yes and no.
You can define that a subset (or all) files use a specific "diff driver"
in the repository - unfortunately the definition of that driver has to
be done locally. Defining it currently involves installing a wrapper
like the one on http://wiki.postgresql.org/wiki/Talk:Working_with_Git
and doing
Andres
On 05/25/2009 07:53 PM, Andres Freund wrote:
On 05/25/2009 07:31 PM, Tom Lane wrote:
David Fetter<david@fetter.org> writes:
On Mon, May 25, 2009 at 12:24:05PM -0400, Tom Lane wrote:
If you'd like to accomplish something *useful* about this, how about
pestering git upstream to support diff -c output format?It looks like this is doable with a suitable git configuration file
such as $HOME/.gitconfig or (finer grain) a .git/config for the
repository :)Cool, let's see one.
If we were to put it into a repository config file, that would more or
less have the effect of enforcing a project style for diffs, no?Yes and no.
You can define that a subset (or all) files use a specific "diff driver"
in the repository - unfortunately the definition of that driver has to
be done locally. Defining it currently involves installing a wrapper
like the one on http://wiki.postgresql.org/wiki/Talk:Working_with_Git
and doing
Ugh, hit the wrong key:
and executing
`git config --global diff.context.command "git-external-diff"`
Andres