Collation mega-cleanups

Started by Bruce Momjianover 14 years ago10 messages
#1Bruce Momjian
bruce@momjian.us

Tom this collation stuff has seen more post-feature-commit cleanups than
I think any patch I remember. Is there anything we can learn from this?

Yes, this is coming from me, who some consider to be the king of
post-commit cleanups, namely, cleaning up my own commits.

---------------------------------------------------------------------------

Tom Lane wrote:

I just noticed that the collation patch has modified char2wchar and
wchar2char to accept a collation OID as argument ... but it hasn't done
anything to make those arguments actually work. Since those functions
depend on wcstombs and mbstowcs, which respond to LC_CTYPE and nothing
else, this flat out does not work in non-default collations. What's
more, there doesn't seem to be any such thing as wcstombs_l or
mbstowcs_l (at least my Fedora box hasn't got them), so this can't be
fixed within the available glibc API.

Right at the moment this only affects str_tolower, str_toupper, and
str_initcap; there are other uses of these functions in the text search
code, but those always pass DEFAULT_COLLATION_OID.

It's possible that things are not too broken in practice, because it's
likely that the transformations done by these functions only depend on
the encoding indicated by LC_CTYPE, and we (try to) enforce that all
locales used in a given database match the database encoding. Still,
that's a rather shaky chain of reasoning.

The complete lack of code comments on this doesn't make me any happier
--- in fact, the comments for char2wchar and wchar2char still claim that
they have the same API as wcstombs and mbstowcs, which can hardly be
considered true when they don't even have the same argument lists.

Any thoughts what to do about this?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Collation mega-cleanups

Bruce Momjian <bruce@momjian.us> writes:

Tom this collation stuff has seen more post-feature-commit cleanups than
I think any patch I remember. Is there anything we can learn from this?

The pre-commit review was obviously woefully inadequate.

regards, tom lane

#3Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#1)
Re: Collation mega-cleanups

On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian <bruce@momjian.us> wrote:

Tom this collation stuff has seen more post-feature-commit cleanups than
I think any patch I remember.  Is there anything we can learn from this?

How about "don't commit all the large patches at the end of the cycle"?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Robert Haas (#3)
Re: Collation mega-cleanups

On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote:

On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian <bruce@momjian.us> wrote:

Tom this collation stuff has seen more post-feature-commit cleanups than
I think any patch I remember. �Is there anything we can learn from this?

How about "don't commit all the large patches at the end of the cycle"?

My take home from following this is: 'Even Tom can get caught in the
"just one more little change" trap'

:-)

Ross
--
Ross Reedstrom, Ph.D. reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE

#5Andres Freund
andres@anarazel.de
In reply to: Ross J. Reedstrom (#4)
Re: Collation mega-cleanups

On Tuesday, May 10, 2011 07:08:23 PM Ross J. Reedstrom wrote:

On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote:

On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian <bruce@momjian.us> wrote:

Tom this collation stuff has seen more post-feature-commit cleanups
than I think any patch I remember. Is there anything we can learn
from this?

How about "don't commit all the large patches at the end of the cycle"?

My take home from following this is: 'Even Tom can get caught in the
"just one more little change" trap'

I don't think any of the changes from Tom deserves that categorization.

Andres

#6Ross J. Reedstrom
reedstrm@rice.edu
In reply to: Andres Freund (#5)
Re: Collation mega-cleanups

On Tue, May 10, 2011 at 07:21:16PM +0200, Andres Freund wrote:

On Tuesday, May 10, 2011 07:08:23 PM Ross J. Reedstrom wrote:

On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote:

On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian <bruce@momjian.us> wrote:

Tom this collation stuff has seen more post-feature-commit cleanups
than I think any patch I remember. Is there anything we can learn
from this?

How about "don't commit all the large patches at the end of the cycle"?

My take home from following this is: 'Even Tom can get caught in the
"just one more little change" trap'

I don't think any of the changes from Tom deserves that categorization.

No disrespect intended, far from it. The trap is that something at seems
at a distance as relatively small can grow on closer inspection. Which I
think is exactly what Tom said (paraphrased) "The pre-commit review was
insufficent" i.e. the remaining problems seemed little, but were not.

In addition, "little" is relative to who's doing the changes, over what
domain. Things that are little for Tom on PostgreSQL would not be so
for me. Presumably the inverse is true over other domains.

So perhaps it was more of the "This code is less ready than I thought
it was, but now that I've spent the time understanding it and the
problem, the shortest way out is forward". I think we've all been in
that swamp, at one time or another.

Ross
--
Ross Reedstrom, Ph.D. reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ross J. Reedstrom (#6)
Re: Collation mega-cleanups

"Ross J. Reedstrom" <reedstrm@rice.edu> writes:

So perhaps it was more of the "This code is less ready than I thought
it was, but now that I've spent the time understanding it and the
problem, the shortest way out is forward".

Yeah, exactly. By the time I really understood how incomplete the
collation patch was, I'd done most of the work to fix it; and insisting
on backing it out of 9.1 didn't seem productive (even assuming that I
could have won that argument, which was by no means a given).

I'm still fairly troubled by the potential overhead in the form of extra
lookups during parse time, but have not had the time to try to measure
that. Too bad we haven't got a performance-test farm.

regards, tom lane

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#1)
Re: Collation mega-cleanups

On mån, 2011-05-09 at 14:58 -0400, Bruce Momjian wrote:

Tom this collation stuff has seen more post-feature-commit cleanups
than I think any patch I remember. Is there anything we can learn
from this?

Don't do big patches?

Seriously, it looks pretty bad, but this is one of the biggest feature
patches in the last 5 years, it touches many places all over the system,
and there is a reason why this topic has been on the TODO list for 10
years: it's overwhelming. I had aimed for a 75% solution: have
something that supports useful cases, that doesn't break anything if you
don't use it, and that can be expanded later. Now maybe I only reached
70%, and maybe the baseline should have been 80%, but what we now have
is more like 107% and includes a handful of features I had explicitly
excluded from the first round.

The patch has been around for 10 months, it has been in every commit
fest, it has tests and documentation, it has been reviewed a bunch of
times, people evidently read (some of) the code, they gave feedback,
adjustments have been made (some reverted during later cleanup, go
figure), performance was questioned, performance tests were done,
adjustments were made, people told me to commit it, so I did, if people
had told me to revert it, I would have reverted it. What can we learn
from that? The bigger your patch, the lonelier you are.

#9Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#8)
Re: Collation mega-cleanups

Peter Eisentraut wrote:

from that? The bigger your patch, the lonelier you are.

I can attest to that.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#8)
Re: Collation mega-cleanups

Peter Eisentraut <peter_e@gmx.net> writes:

Seriously, it looks pretty bad, but this is one of the biggest feature
patches in the last 5 years, it touches many places all over the system,
and there is a reason why this topic has been on the TODO list for 10
years: it's overwhelming.

Yeah. I did not want to press for reverting, because it seemed clear
to me that there was no way that this feature would ever get in if we
insisted that it be 100% right when committed. My idea of "good enough"
kept moving the more I looked at the patch, though, and it's still
moving --- now I think that we really need to fix the lack of preloaded
pg_collation entries for Windows, and then get in a regression test that
runs everywhere. So if you want to call that feature creep, go ahead.

regards, tom lane