Need more reviewers!
Hackers,
We currently have 38 patches pending, and only nine people reviewing them.
At this rate, the September commitfest will take three months.
If you are a postgresql hacker at all, or even want to be one, we need your
help reviewing patches! There are several "easy" patches in the list, so
I can assign them to beginners.
Please volunteer now!
--
--Josh
Josh Berkus
PostgreSQL
San Francisco
On Thu, Sep 4, 2008 at 1:45 PM, Josh Berkus <josh@agliodbs.com> wrote:
We currently have 38 patches pending, and only nine people reviewing them.
At this rate, the September commitfest will take three months.
I'll push forward on reviewing and testing Xiao's hash index
improvements for inclusion into core. Though, someone will still need
to review my stuff.
--
Jonah H. Harris, Senior DBA
myYearbook.com
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
I'll push forward on reviewing and testing Xiao's hash index
improvements for inclusion into core. Though, someone will still need
to review my stuff.
I think what the hash index patch really needs is some performance
testing. I'm willing to take responsibility for the code being okay
or not, but I haven't got any production-grade hardware to do realistic
performance tests on. I'd like to see a few more scenarios tested than
the one provided so far: in particular, wide vs narrow index keys and
good vs bad key distributions.
It strikes me that there's an intermediate posture that the patch
doesn't provide: namely, store *both* hash code and original index key
in each index entry. This would make the index larger rather than
smaller, but you could still do the intra-page sort by hash code,
which I think is likely a big win. In exchange for the bigger index
you'd avoid fruitless trips to the heap for chance hashcode matches.
So it seems far from clear to me that the hash-code-only solution is
necessarily the best.
If anyone is willing to do comparative performance testing, I'll
volunteer to make up two variant patches that do it both ways and
are otherwise equivalent.
(I wouldn't be too surprised if testing shows that each way could be
a win depending on the situation. In that case we could think about
how to provide a switch to let users pick the method. But for the
moment let's just test two separate patches.)
regards, tom lane
On Thu, Sep 04, 2008 at 02:01:18PM -0400, Tom Lane wrote:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:
I'll push forward on reviewing and testing Xiao's hash index
improvements for inclusion into core. Though, someone will still need
to review my stuff.I think what the hash index patch really needs is some performance
testing. I'm willing to take responsibility for the code being okay
or not, but I haven't got any production-grade hardware to do realistic
performance tests on. I'd like to see a few more scenarios tested than
the one provided so far: in particular, wide vs narrow index keys and
good vs bad key distributions.
Tom,
Do you mean good vs. bad key distributions with respect to hash value
collisions?
It strikes me that there's an intermediate posture that the patch
doesn't provide: namely, store *both* hash code and original index key
in each index entry. This would make the index larger rather than
smaller, but you could still do the intra-page sort by hash code,
which I think is likely a big win. In exchange for the bigger index
you'd avoid fruitless trips to the heap for chance hashcode matches.
So it seems far from clear to me that the hash-code-only solution is
necessarily the best.
Currently, since a trip to the heap is required to validate any tuple
even if the exact value is contained in the index, it does not seem
like it would be a win to store the value in both places. One of the
big improvements is the space efficiency of the index obtained by
just storing the hash value. I think that increasing the number of
hash bits stored would provide more bang-for-the-buck than storing
the exact value. One easy way to provide this functionality without
increasing the storage requirements is to take advantage of the
knowledge of the bucket number to increase the number of bit, i.e.
at 256 buckets, remove the first 8-bits of the hash and add 8-bits
to provide a 40-bit hash in the same storage. At 64k buckets, you
can now store a 48-bit hash in the same space and at 16m buckets, you
can store a 56-bit hash in the original 32-bit space. So as the number
of buckets increases, the number of hash bits increases as well as the
specificity of the resulting hash thereby reducing the need to visit
the heap tuple store.
Another idea, takes advantage of smaller "bucket" size (I think of
them as sub-buckets) to add an additional 8-bits of hash value at
the 32m or 64m buckets by looking up the hash in one of 2^n sub-buckets
where n = 64 or 128 or even 256. This has the advantage of eliminating
the binary lookup and insertion within the bucket page. Again, this
will need to be tested.
If anyone is willing to do comparative performance testing, I'll
volunteer to make up two variant patches that do it both ways and
are otherwise equivalent.
I am in the boat with you, in that I do not have database systems
with enough hardware to perform realistic testing.
(I wouldn't be too surprised if testing shows that each way could be
a win depending on the situation. In that case we could think about
how to provide a switch to let users pick the method. But for the
moment let's just test two separate patches.)
I think, as of yet un-tested, that where storing both would be of
value when we do not have to visit the heap for visibility information
or when using the space to store more hash bits would be more effective.
Cheers,
Ken
Show quoted text
regards, tom lane
Kenneth Marshall <ktm@rice.edu> writes:
On Thu, Sep 04, 2008 at 02:01:18PM -0400, Tom Lane wrote:
I think what the hash index patch really needs is some performance
testing. I'm willing to take responsibility for the code being okay
or not, but I haven't got any production-grade hardware to do realistic
performance tests on. I'd like to see a few more scenarios tested than
the one provided so far: in particular, wide vs narrow index keys and
good vs bad key distributions.
Do you mean good vs. bad key distributions with respect to hash value
collisions?
Right, I'm just concerned about how badly it degrades with hash value
collisions. Those will result in wasted trips to the heap if we omit
the original key from the index. AFAICS that's the only downside of
doing so; but we should have an idea of how bad it could get before
committing to doing this.
Currently, since a trip to the heap is required to validate any tuple
even if the exact value is contained in the index, it does not seem
like it would be a win to store the value in both places.
The point is that currently you *don't* need a trip to the heap if
the key doesn't match, even if it has the same hash code.
I think that increasing the number of
hash bits stored would provide more bang-for-the-buck than storing
the exact value.
Maybe, but that would require an extremely invasive patch that breaks
existing user-defined datatypes. You can't just magically get more hash
bits from someplace, you need datatype-specific hash functions that will
provide more than 32 bits. There's going to have to be a LOT of
evidence in support of the value of doing that before I'll buy in.
regards, tom lane
On Thu, 2008-09-04 at 14:01 -0400, Tom Lane wrote:
If anyone is willing to do comparative performance testing, I'll
volunteer to make up two variant patches that do it both ways and
are otherwise equivalent.
Why not do both, set via a reloption? We can then set the default to
whichever wins in general testing. There will always be cases where one
or the other is a winner, so having both will be useful.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Thu, 2008-09-04 at 10:45 -0700, Josh Berkus wrote:
We currently have 38 patches pending, and only nine people reviewing them.
At this rate, the September commitfest will take three months.If you are a postgresql hacker at all, or even want to be one, we need your
help reviewing patches! There are several "easy" patches in the list, so
I can assign them to beginners.Please volunteer now!
Everybody is stuck in "I'm not good enough to do a full review". They're
right (myself included), so that just means we're organising it wrongly.
We can't expect to grow more supermen, but we probably can do more
teamwork and delegation.
I think this should be organised with different kinds of reviewer:
* submission review - is patch in standard form, does it apply, does it
have reasonable tests, docs etc.. Little or no knowledge of patch
required to do that. We have at least 50 people can do that.
* usability review - read what patch does. Do we want that? Do we
already have it? Does it follow SQL spec? Are there dangers? Have all
the bases been covered? We have 100s of people who can do that.
* feature test - does feature work as advertised?
* performance review - does the patch slow down simple tests? Does it
speed things up like it says? Does it slow down other things? We have at
least 100 people who can do that.
* coding review - does it follow standard code guidelines? Are there
portability issues? Will it work on Windows/BSD etc? Are there
sufficient comments?
* code review - does it do what it says, correctly?
* architecture review - is everything done in a way that fits together
coherently with other features/modules? are there interdependencies than
can cause problems?
* review review - did the reviewer cover all the things that kind of
reviewer is supposed to do?
If we split up things like this, we'll get a better response. Why not go
to -perform for performance testers, general for usability and feature
testers? If we encourage different types of review, more people will be
willing to step up to doing a small amount for the project. Lots of
eyeballs, not one good pair of eyes.
Also, why has the CommitFest page been hidden? Whats the point of that?
We probably need to offer some training and/or orientation for
prospective reviewers, so people understand what is expected of them,
how to do it and who to tell when they've done it.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes:
On Thu, 2008-09-04 at 14:01 -0400, Tom Lane wrote:
If anyone is willing to do comparative performance testing, I'll
volunteer to make up two variant patches that do it both ways and
are otherwise equivalent.
Why not do both, set via a reloption?
That is exactly the code I'm unwilling to write until we find out if
it's needed.
regards, tom lane
On Thu, Sep 4, 2008 at 12:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think what the hash index patch really needs is some performance
testing. I'm willing to take responsibility for the code being okay
or not, but I haven't got any production-grade hardware to do realistic
performance tests on. I'd like to see a few more scenarios tested than
the one provided so far: in particular, wide vs narrow index keys and
good vs bad key distributions.
If anyone is willing to do comparative performance testing, I'll
volunteer to make up two variant patches that do it both ways and
are otherwise equivalent.
I can happily through some hardware at this. Although
"production-grade" is in the eye of the beholder...
Show quoted text
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
"Alex Hunsaker" <badalex@gmail.com> writes:
I can happily through some hardware at this. Although
"production-grade" is in the eye of the beholder...
I just posted a revised patch in the pgsql-patches thread.
regards, tom lane
On Fri, Sep 5, 2008 at 6:54 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Thu, 2008-09-04 at 10:45 -0700, Josh Berkus wrote:
Please volunteer now!
Everybody is stuck in "I'm not good enough to do a full review". They're
right (myself included), so that just means we're organising it wrongly.
We can't expect to grow more supermen, but we probably can do more
teamwork and delegation.
As a first-time reviewer, I agree with Simon's comments, and I'd like
to make the point that there's currently no written policy for how to
review a patch.
I let Josh know that I was interesting in joining this commitfest as a
"round robin" reviewer, and he's assigned me a patch. Okay. What am
I supposed to do now?
I can certainly download the patch, test it, review the code, and
write my thoughts to the list. I can then add a "review" link to the
wiki page. Assuming I think the patch is acceptable, what then? Do I
hand it off to somebody else for a full review/commit? How do I do
that? etc.
At the moment, for the review virgin, "please volunteer now"
translates roughly as "please elect to join an opaque and undocumented
process which has until now been handled entirely by committers".
That might have something to do with the low turnout.
We have a (really useful) wiki page called "Submitting a Patch". I
think we need one called "Reviewing a Patch".
That way, instead of just an appeal to the masses to volunteer for
$NEBULOUS_TASK, we can say something like "Please volunteer to review
patches. Doing an initial patch review is easy, please see our guide
<link> to learn more."
Cheers,
BJ
Josh Berkus wrote:
Hackers,
We currently have 38 patches pending, and only nine people reviewing them.
At this rate, the September commitfest will take three months.If you are a postgresql hacker at all, or even want to be one, we need your
help reviewing patches! There are several "easy" patches in the list, so
I can assign them to beginners.Please volunteer now!
Please assign me one; any of the "easy" ones.
--
Ibrar Ahmed
EnterpriseDB http://www.enterprisedb.com
On Thu, 2008-09-04 at 10:45 -0700, Josh Berkus wrote:
If you are a postgresql hacker at all, or even want to be one, we need your
help reviewing patches! There are several "easy" patches in the list, so
I can assign them to beginners.
It would be a reasonable rule that all patch submitters also have to do
patch reviews. If we made it a strict rule, then sponsoring companies
would know that they *must* provide money/time for that aspect also.
Otherwise it is almost impossible to get formal approval to do that.
I count more than 20 patch authors that are not reviewing, including
myself. Of that, there are at least 5 people on their second or
subsequent patch (figure probably wildly inaccurate, since don't keep
track of that).
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
That way, instead of just an appeal to the masses to volunteer for
$NEBULOUS_TASK, we can say something like "Please volunteer to review
patches. Doing an initial patch review is easy, please see our guide
<link> to learn more."
+1. I'll review a patch if you like, but the patch I have in this
'fest is only one of two I've ever submitted, and my understanding of
PG guts is very weak, so I confidently predict that me looking at it
has maybe 5% of the value of a committer looking at it, so I'm not
sure that really gets us anywhere. Even if I think the patch is
great, my opinion doesn't and shouldn't carry much weight compared to
someone like Simon or Zdenek who are probably perceived by the
committers as much more likely to have a clue, so the committer is
just going to end up reviewing it all over again anyway.
...Robert
Josh Berkus wrote:
Hackers,
We currently have 38 patches pending, and only nine people reviewing them.
At this rate, the September commitfest will take three months.If you are a postgresql hacker at all, or even want to be one, we need your
help reviewing patches! There are several "easy" patches in the list, so
I can assign them to beginners.Please volunteer now!
Please assign me one; any of the "easy" ones.
--
Ibrar Ahmed
EnterpriseDB http://www.enterprisedb.com
On Fri, 2008-09-05 at 12:18 +0100, Simon Riggs wrote:
It would be a reasonable rule that all patch submitters also have to
do patch reviews.
This is almost the only way to be accepted as a contributor to Fedora --
and I like it.
--
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org
Simon Riggs wrote:
On Thu, 2008-09-04 at 10:45 -0700, Josh Berkus wrote:
If you are a postgresql hacker at all, or even want to be one, we need your
help reviewing patches! There are several "easy" patches in the list, so
I can assign them to beginners.It would be a reasonable rule that all patch submitters also have to do
patch reviews. If we made it a strict rule, then sponsoring companies
would know that they *must* provide money/time for that aspect also.
Otherwise it is almost impossible to get formal approval to do that.
All this would do is to deter people from submitting patches. Hard rules
like this don't work in FOSS communities. I know it's like herding cats,
but persuasion is really our only tool.
cheers
andrew
On Fri, 2008-09-05 at 09:19 -0400, Andrew Dunstan wrote:
Simon Riggs wrote:
On Thu, 2008-09-04 at 10:45 -0700, Josh Berkus wrote:
If you are a postgresql hacker at all, or even want to be one, we need your
help reviewing patches! There are several "easy" patches in the list, so
I can assign them to beginners.It would be a reasonable rule that all patch submitters also have to do
patch reviews. If we made it a strict rule, then sponsoring companies
would know that they *must* provide money/time for that aspect also.
Otherwise it is almost impossible to get formal approval to do that.
All this would do is to deter people from submitting patches. Hard rules
like this don't work in FOSS communities. I know it's like herding cats,
but persuasion is really our only tool.
I don't *want* the rule, I just think we *need* the rule because
otherwise sponsors/managers/etc make business decisions to exclude that
aspect of the software dev process.
Otherwise we have a patch-and-dump culture that is unsustainable because
a few people's benevolence as reviewers turns everything into a
bottleneck. It doesn't need to mean loss of control for core and
committers.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
I don't *want* the rule, I just think we *need* the rule because
otherwise sponsors/managers/etc make business decisions to exclude that
aspect of the software dev process.
How exactly would you even begin to enforce such a rule? Retroactively
pull otherwise vali patches from the queue? Ban people from sending
email to the -patches list?
Otherwise we have a patch-and-dump culture that is unsustainable because
a few people's benevolence as reviewers turns everything into a
bottleneck. It doesn't need to mean loss of control for core and
committers.
That problem needs a solution, but not the one you proposed.
- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200809050953
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----
iEYEAREDAAYFAkjBOdIACgkQvJuQZxSWSsiFoACgoqOgumuuZq6z2HBPSAPZUWHd
kS0An2TgFmOLTgdFWuLkpazFbECY4nnz
=ZrYl
-----END PGP SIGNATURE-----
Hi,
Simon Riggs wrote:
On Fri, 2008-09-05 at 09:19 -0400, Andrew Dunstan wrote:
All this would do is to deter people from submitting patches. Hard rules
like this don't work in FOSS communities. I know it's like herding cats,
but persuasion is really our only tool.
+1
I don't *want* the rule, I just think we *need* the rule because
otherwise sponsors/managers/etc make business decisions to exclude that
aspect of the software dev process.
I agree that making sponsors/managers/etc aware of that aspect of the
dev process is necessary and worthwhile. However, I don't think a rule
for *patch submitters* helps with that. There must be other ways to
convince managers to encourage reviewers.
Regards
Markus Wanner