TRUE/FALSE vs true/false

Started by Boszormenyi Zoltanover 14 years ago23 messages
#1Boszormenyi Zoltan
zb@cybertec.at

Hi,

I looked at b4fbe392f8ff6ff1a66b488eb7197eef9e1770a4
and I noticed that it's using TRUE, FALSE, true and false
inconsistently:

@@ -248,6 +249,7 @@ CreateSharedInvalidationState(void)
                shmInvalBuffer->procState[i].nextMsgNum = 0;    /* meaningless */
                shmInvalBuffer->procState[i].resetState = false;
                shmInvalBuffer->procState[i].signaled = false;
+               shmInvalBuffer->procState[i].hasMessages = false;
                shmInvalBuffer->procState[i].nextLXID = InvalidLocalTransactionId;
        }
 }

@@ -316,6 +316,7 @@ SharedInvalBackendInit(bool sendOnly)
stateP->nextMsgNum = segP->maxMsgNum;
stateP->resetState = false;
stateP->signaled = false;
+ stateP->hasMessages = false;
stateP->sendOnly = sendOnly;

LWLockRelease(SInvalWriteLock);

@@ -459,6 +461,19 @@ SIInsertDataEntries(const SharedInvalidationMessage *data, int n)
SpinLockRelease(&vsegP->msgnumLock);
}

+               /*
+                * Now that the maxMsgNum change is globally visible, we give
+                * everyone a swift kick to make sure they read the newly added
+                * messages.  Releasing SInvalWriteLock will enforce a full memory
+                * barrier, so these (unlocked) changes will be committed to memory
+                * before we exit the function.
+                */
+               for (i = 0; i < segP->lastBackend; i++)
+               {
+                       ProcState  *stateP = &segP->procState[i];
+                       stateP->hasMessages = TRUE;
+               }
+
                LWLockRelease(SInvalWriteLock);
        }
 }
@@ -499,11 +514,36 @@ SIGetDataEntries(SharedInvalidationMessage *data, int datasize)
...
+        * Note that, if we don't end up reading all of the messages, we had
+        * better be certain to reset this flag before exiting!
+        */
+       stateP->hasMessages = FALSE;
+
@@ -544,10 +584,16 @@ SIGetDataEntries(SharedInvalidationMessage *data, int datasize)
...
        if (stateP->nextMsgNum >= max)
                stateP->signaled = false;
+       else
+               stateP->hasMessages = TRUE;

Also, grepping for checking for or assigning bool values reveal that
"true" and "false" are used far more than "TRUE" and "FALSE":

[zozo@localhost backend]$ find . -name "*.c" | xargs grep -w true | grep -v 'true"' | grep
= | wc -l
2446
[zozo@localhost backend]$ find . -name "*.c" | xargs grep -w false | grep -v 'false"' |
grep = | wc -l
2745
[zozo@localhost backend]$ find . -name "*.c" | xargs grep -w TRUE | grep -v 'TRUE"' | grep
= | wc -l
119
[zozo@localhost backend]$ find . -name "*.c" | xargs grep -w FALSE | grep -v 'FALSE"' |
grep = | wc -l
140

Shouldn't these get fixed to be consistent?

Best regards,
Zolt�n B�sz�rm�nyi

--
----------------------------------
Zolt�n B�sz�rm�nyi
Cybertec Sch�nig & Sch�nig GmbH
Gr�hrm�hlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/

#2Robert Haas
robertmhaas@gmail.com
In reply to: Boszormenyi Zoltan (#1)
Re: TRUE/FALSE vs true/false

2011/8/4 Boszormenyi Zoltan <zb@cybertec.at>:

Shouldn't these get fixed to be consistent?

I believe I already did. See commit 85b436f7b1f06a6ffa8d2f29b03d6e440de18784.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Boszormenyi Zoltan
zb@cybertec.at
In reply to: Robert Haas (#2)
Re: TRUE/FALSE vs true/false

2011-08-04 14:32 keltez�ssel, Robert Haas �rta:

2011/8/4 Boszormenyi Zoltan <zb@cybertec.at>:

Shouldn't these get fixed to be consistent?

I believe I already did. See commit 85b436f7b1f06a6ffa8d2f29b03d6e440de18784.

I meant a mass "sed -e 's/TRUE/true/g' -e 's/FALSE/false/g'" run
so all the ~200 occurrences of both "TRUE" and "FALSE" get
converted so the whole source tree is consistent.

--
----------------------------------
Zolt�n B�sz�rm�nyi
Cybertec Sch�nig & Sch�nig GmbH
Gr�hrm�hlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/

#4Robert Haas
robertmhaas@gmail.com
In reply to: Boszormenyi Zoltan (#3)
Re: TRUE/FALSE vs true/false

On Thu, Aug 4, 2011 at 8:44 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote:

2011-08-04 14:32 keltezéssel, Robert Haas írta:

2011/8/4 Boszormenyi Zoltan <zb@cybertec.at>:

Shouldn't these get fixed to be consistent?

I believe I already did.  See commit 85b436f7b1f06a6ffa8d2f29b03d6e440de18784.

I meant a mass "sed -e 's/TRUE/true/g'  -e 's/FALSE/false/g'" run
so all the ~200 occurrences of both "TRUE" and "FALSE" get
converted so the whole source tree is consistent.

Oh, I see. Well, I don't care either way, so I'll let others weigh
in. The way it is doesn't bother me, but fixing it doesn't bother me
either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#4)
Re: TRUE/FALSE vs true/false

On 4 August 2011 13:57, Robert Haas <robertmhaas@gmail.com> wrote:

Oh, I see.  Well, I don't care either way, so I'll let others weigh
in.  The way it is doesn't bother me, but fixing it doesn't bother me
either.

Idiomatic win32 code uses BOOL and TRUE/FALSE. They are macros defined
somewhere or other.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Boszormenyi Zoltan (#3)
Re: TRUE/FALSE vs true/false

On tor, 2011-08-04 at 14:44 +0200, Boszormenyi Zoltan wrote:

2011-08-04 14:32 keltezéssel, Robert Haas írta:

2011/8/4 Boszormenyi Zoltan <zb@cybertec.at>:

Shouldn't these get fixed to be consistent?

I believe I already did. See commit 85b436f7b1f06a6ffa8d2f29b03d6e440de18784.

I meant a mass "sed -e 's/TRUE/true/g' -e 's/FALSE/false/g'" run
so all the ~200 occurrences of both "TRUE" and "FALSE" get
converted so the whole source tree is consistent.

I would be in favor of that.

#7Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#6)
Re: TRUE/FALSE vs true/false

On Thu, Aug 4, 2011 at 09:00:11PM +0300, Peter Eisentraut wrote:

On tor, 2011-08-04 at 14:44 +0200, Boszormenyi Zoltan wrote:

2011-08-04 14:32 keltez�ssel, Robert Haas �rta:

2011/8/4 Boszormenyi Zoltan <zb@cybertec.at>:

Shouldn't these get fixed to be consistent?

I believe I already did. See commit 85b436f7b1f06a6ffa8d2f29b03d6e440de18784.

I meant a mass "sed -e 's/TRUE/true/g' -e 's/FALSE/false/g'" run
so all the ~200 occurrences of both "TRUE" and "FALSE" get
converted so the whole source tree is consistent.

I would be in favor of that.

I have implemented this with the patch at:

http://momjian.us/expire/true_diff.txt

I did not modify the Win32 code which traditionally uses uppercase for
TRUE/FALSE.

It would be applied only to HEAD.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: TRUE/FALSE vs true/false

Bruce Momjian <bruce@momjian.us> writes:

On Thu, Aug 4, 2011 at 09:00:11PM +0300, Peter Eisentraut wrote:

On tor, 2011-08-04 at 14:44 +0200, Boszormenyi Zoltan wrote:

I meant a mass "sed -e 's/TRUE/true/g' -e 's/FALSE/false/g'" run
so all the ~200 occurrences of both "TRUE" and "FALSE" get
converted so the whole source tree is consistent.

I would be in favor of that.

I have implemented this with the patch at:
http://momjian.us/expire/true_diff.txt

Does this really do anything for us that will justify the extra
back-patching pain it will cause? I don't see that it's improving
code readability any.

regards, tom lane

#9Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: TRUE/FALSE vs true/false

On Tue, Aug 14, 2012 at 05:34:02PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Thu, Aug 4, 2011 at 09:00:11PM +0300, Peter Eisentraut wrote:

On tor, 2011-08-04 at 14:44 +0200, Boszormenyi Zoltan wrote:

I meant a mass "sed -e 's/TRUE/true/g' -e 's/FALSE/false/g'" run
so all the ~200 occurrences of both "TRUE" and "FALSE" get
converted so the whole source tree is consistent.

I would be in favor of that.

I have implemented this with the patch at:
http://momjian.us/expire/true_diff.txt

Does this really do anything for us that will justify the extra
back-patching pain it will cause? I don't see that it's improving
code readability any.

I think it is more of a consistency issue. There were multiple people
who wanted this change. Of course, some of those people don't backport
stuff.

Other comments?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#9)
Re: TRUE/FALSE vs true/false

On Tue, 2012-08-14 at 17:36 -0400, Bruce Momjian wrote:

On Tue, Aug 14, 2012 at 05:34:02PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Thu, Aug 4, 2011 at 09:00:11PM +0300, Peter Eisentraut wrote:

On tor, 2011-08-04 at 14:44 +0200, Boszormenyi Zoltan wrote:

I meant a mass "sed -e 's/TRUE/true/g' -e 's/FALSE/false/g'" run
so all the ~200 occurrences of both "TRUE" and "FALSE" get
converted so the whole source tree is consistent.

I would be in favor of that.

I have implemented this with the patch at:
http://momjian.us/expire/true_diff.txt

Does this really do anything for us that will justify the extra
back-patching pain it will cause? I don't see that it's improving
code readability any.

I think it is more of a consistency issue. There were multiple people
who wanted this change. Of course, some of those people don't backport
stuff.

I guess you could argue this particular case without end, but I think we
should be open to these kinds of changes. They will make future code
easier to deal with and confuse new developers less (when to use which,
do they mean different things, etc.).

If back-patching really becomes a problem, we might want to look a
little deeper into git options. I think the Linux kernel people do
these kinds of cleanups more often, so there is probably some better
support for it.

#11Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#10)
Re: TRUE/FALSE vs true/false

On Tue, Aug 14, 2012 at 10:57:08PM -0400, Peter Eisentraut wrote:

On Tue, 2012-08-14 at 17:36 -0400, Bruce Momjian wrote:

On Tue, Aug 14, 2012 at 05:34:02PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Thu, Aug 4, 2011 at 09:00:11PM +0300, Peter Eisentraut wrote:

On tor, 2011-08-04 at 14:44 +0200, Boszormenyi Zoltan wrote:

I meant a mass "sed -e 's/TRUE/true/g' -e 's/FALSE/false/g'" run
so all the ~200 occurrences of both "TRUE" and "FALSE" get
converted so the whole source tree is consistent.

I would be in favor of that.

I have implemented this with the patch at:
http://momjian.us/expire/true_diff.txt

Does this really do anything for us that will justify the extra
back-patching pain it will cause? I don't see that it's improving
code readability any.

I think it is more of a consistency issue. There were multiple people
who wanted this change. Of course, some of those people don't backport
stuff.

I guess you could argue this particular case without end, but I think we
should be open to these kinds of changes. They will make future code
easier to deal with and confuse new developers less (when to use which,
do they mean different things, etc.).

If back-patching really becomes a problem, we might want to look a
little deeper into git options. I think the Linux kernel people do
these kinds of cleanups more often, so there is probably some better
support for it.

So what do we want to do with this? I am a little concerned that we are
sacrificing code clarity for backpatching ease, but I don't do as much
backpatching as Tom.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#12Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#11)
Re: TRUE/FALSE vs true/false

Bruce Momjian <bruce@momjian.us> wrote:

So what do we want to do with this? I am a little concerned that
we are sacrificing code clarity for backpatching ease, but I don't
do as much backpatching as Tom.

Well, if you back-patched this change, it would eliminate the issue
for Tom, wouldn't it? Not sure if that's sane; just a thought.

-Kevin

#13Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#12)
Re: TRUE/FALSE vs true/false

On Thu, Aug 16, 2012 at 02:21:12PM -0500, Kevin Grittner wrote:

Bruce Momjian <bruce@momjian.us> wrote:

So what do we want to do with this? I am a little concerned that
we are sacrificing code clarity for backpatching ease, but I don't
do as much backpatching as Tom.

Well, if you back-patched this change, it would eliminate the issue
for Tom, wouldn't it? Not sure if that's sane; just a thought.

I would be worried about some instability in backpatching. I was
looking for an 'ignore-case' mode to patch, but I don't see it.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#14Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#13)
Re: TRUE/FALSE vs true/false

On Thu, Aug 16, 2012 at 3:32 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Thu, Aug 16, 2012 at 02:21:12PM -0500, Kevin Grittner wrote:

Bruce Momjian <bruce@momjian.us> wrote:

So what do we want to do with this? I am a little concerned that
we are sacrificing code clarity for backpatching ease, but I don't
do as much backpatching as Tom.

Well, if you back-patched this change, it would eliminate the issue
for Tom, wouldn't it? Not sure if that's sane; just a thought.

I would be worried about some instability in backpatching. I was
looking for an 'ignore-case' mode to patch, but I don't see it.

I have difficult believing that a change of this type, if implemented
judiciously, is really going to create that much difficulty in
back-patching. I don't do as much back-patching as Tom either (no one
does), but most of the patches I do back-patch can be cherry-picked
all the way back without a problem. Some require adjustment, but even
then this kind of thing is pretty trivial to handle, as it's pretty
obvious what happened when you look through it. The really nasty
problems tend to come from places where the code has been rearranged,
rather than simple A-for-B substitutions.

I think the thing we need to look at is what percentage of our code
churn is coming from stuff like this, versus what percentage of it is
coming from other factors. If we change 250,000 lines of code per
release cycle and of that this kind of thing accounts for 5,000 lines
of deltas, then IMHO it's not really material. If it accounts for
50,000 lines of deltas out of the same base, that's probably more than
can really be justified by the benefit we're going to get out of it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#14)
Re: TRUE/FALSE vs true/false

On Mon, Aug 20, 2012 at 03:09:08PM -0400, Robert Haas wrote:

I have difficult believing that a change of this type, if implemented
judiciously, is really going to create that much difficulty in
back-patching. I don't do as much back-patching as Tom either (no one
does), but most of the patches I do back-patch can be cherry-picked
all the way back without a problem. Some require adjustment, but even
then this kind of thing is pretty trivial to handle, as it's pretty
obvious what happened when you look through it. The really nasty
problems tend to come from places where the code has been rearranged,
rather than simple A-for-B substitutions.

I think the thing we need to look at is what percentage of our code
churn is coming from stuff like this, versus what percentage of it is
coming from other factors. If we change 250,000 lines of code per
release cycle and of that this kind of thing accounts for 5,000 lines
of deltas, then IMHO it's not really material. If it accounts for
50,000 lines of deltas out of the same base, that's probably more than
can really be justified by the benefit we're going to get out of it.

The true/false capitalization patch changes 1.2k lines.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#15)
Re: TRUE/FALSE vs true/false

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Aug 20, 2012 at 03:09:08PM -0400, Robert Haas wrote:

I think the thing we need to look at is what percentage of our code
churn is coming from stuff like this, versus what percentage of it is
coming from other factors. If we change 250,000 lines of code per
release cycle and of that this kind of thing accounts for 5,000 lines
of deltas, then IMHO it's not really material. If it accounts for
50,000 lines of deltas out of the same base, that's probably more than
can really be justified by the benefit we're going to get out of it.

The true/false capitalization patch changes 1.2k lines.

I did a quick look at git diff --stat between recent branches:

$ git diff --shortstat REL9_0_9 REL9_1_5
3186 files changed, 314847 insertions(+), 210452 deletions(-)
$ git diff --shortstat REL9_1_5 REL9_2_BETA4
2037 files changed, 290919 insertions(+), 189487 deletions(-)

However, when you look at things a bit closer, these numbers are
misleading because they include the .po files, which seem to have huge
inter-branch churn --- well in excess of 100000 lines changed per
release, at least in git's simpleminded view. Excluding those, as well
as src/test/isolation/expected/prepared-transactions.out which added
34843 lines all by itself, I get
173080 insertions, 70300 deletions for 9.0.9 -> 9.1.5
130706 insertions, 55714 deletions for 9.1.5 -> 9.2beta4.
So it looks like we touch order-of-magnitude of 100K lines per release;
which still seems astonishingly high, but then this includes docs and
regression tests not just code. If I restrict the stat to *.[chyl]
files it's about half that:

$ git diff --numstat REL9_0_9 REL9_1_5 | grep '\.[chyl]$' | awk '{a += $1; b += $2}
END{print a,b}'
90234 33902
$ git diff --numstat REL9_1_5 REL9_2_BETA4 | grep '\.[chyl]$' | awk '{a += $1; b += $2}
END{print a,b}'
90200 42218

So a patch of 1K lines would by itself represent about 2% of the typical
inter-branch delta. Maybe that's below our threshold of pain, or maybe
it isn't. I'd be happier about it if there were a more compelling
argument for it, but to me it looks like extremely trivial neatnik-ism.

regards, tom lane

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#16)
size of .po changesets

Excerpts from Tom Lane's message of jue ago 23 11:01:05 -0400 2012:

$ git diff --shortstat REL9_0_9 REL9_1_5
3186 files changed, 314847 insertions(+), 210452 deletions(-)
$ git diff --shortstat REL9_1_5 REL9_2_BETA4
2037 files changed, 290919 insertions(+), 189487 deletions(-)

However, when you look at things a bit closer, these numbers are
misleading because they include the .po files, which seem to have huge
inter-branch churn --- well in excess of 100000 lines changed per
release, at least in git's simpleminded view.

Yeah, IMHO .po files are handled pretty badly by SCMs. I wonder if we
could reduce the amount of git churn caused by those files by simply
removing all comment lines from these files as they are exported from
pgtranslation into postgresql.git? Since they are not "source" for
postgresql.git anyway, the other one being the canonical repository,
there doesn't seem to be any point to those lines ... or am I mistaken?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#18Roger Leigh
rleigh@codelibre.net
In reply to: Alvaro Herrera (#17)
Re: size of .po changesets

On Thu, Aug 23, 2012 at 11:21:35AM -0400, Alvaro Herrera wrote:

Excerpts from Tom Lane's message of jue ago 23 11:01:05 -0400 2012:

$ git diff --shortstat REL9_0_9 REL9_1_5
3186 files changed, 314847 insertions(+), 210452 deletions(-)
$ git diff --shortstat REL9_1_5 REL9_2_BETA4
2037 files changed, 290919 insertions(+), 189487 deletions(-)

However, when you look at things a bit closer, these numbers are
misleading because they include the .po files, which seem to have huge
inter-branch churn --- well in excess of 100000 lines changed per
release, at least in git's simpleminded view.

Yeah, IMHO .po files are handled pretty badly by SCMs. I wonder if we
could reduce the amount of git churn caused by those files by simply
removing all comment lines from these files as they are exported from
pgtranslation into postgresql.git? Since they are not "source" for
postgresql.git anyway, the other one being the canonical repository,
there doesn't seem to be any point to those lines ... or am I mistaken?

Have you considered adding "--no-location" to XGETTEXT_OPTIONS in
po/Makevars? This stops the massive churn through line renumbering
changes.

Regards,
Roger

--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' schroot and sbuild http://alioth.debian.org/projects/buildd-tools
`- GPG Public Key F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Roger Leigh (#18)
Re: size of .po changesets

Roger Leigh <rleigh@codelibre.net> writes:

On Thu, Aug 23, 2012 at 11:21:35AM -0400, Alvaro Herrera wrote:

Yeah, IMHO .po files are handled pretty badly by SCMs. I wonder if we
could reduce the amount of git churn caused by those files by simply
removing all comment lines from these files as they are exported from
pgtranslation into postgresql.git?

Have you considered adding "--no-location" to XGETTEXT_OPTIONS in
po/Makevars? This stops the massive churn through line renumbering
changes.

I think the line numbers are actually useful to the translators --- at
least, the theory behind having them is to make it easy to look at the
message in context. Alvaro's point is that the copies of the .po files
in our SCM are pretty much write-only data, and could be stripped
relative to what the translators work with.

regards, tom lane

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#19)
Re: size of .po changesets

Excerpts from Tom Lane's message of jue ago 23 13:33:46 -0400 2012:

Roger Leigh <rleigh@codelibre.net> writes:

On Thu, Aug 23, 2012 at 11:21:35AM -0400, Alvaro Herrera wrote:

Yeah, IMHO .po files are handled pretty badly by SCMs. I wonder if we
could reduce the amount of git churn caused by those files by simply
removing all comment lines from these files as they are exported from
pgtranslation into postgresql.git?

Have you considered adding "--no-location" to XGETTEXT_OPTIONS in
po/Makevars? This stops the massive churn through line renumbering
changes.

I think the line numbers are actually useful to the translators --- at
least, the theory behind having them is to make it easy to look at the
message in context.

Yeah, and I, for one, do use them quite a bit to look up the code
context when the message is unclear.

Alvaro's point is that the copies of the .po files
in our SCM are pretty much write-only data, and could be stripped
relative to what the translators work with.

Right.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#21Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#17)
Re: size of .po changesets

On Thu, 2012-08-23 at 11:21 -0400, Alvaro Herrera wrote:

Yeah, IMHO .po files are handled pretty badly by SCMs.

By SCMs that store diffs internally, perhaps, but Git doesn't, so I
don't think it matters much for storage whether .po files diff well.

I wonder if we
could reduce the amount of git churn caused by those files by simply
removing all comment lines from these files as they are exported from
pgtranslation into postgresql.git? Since they are not "source" for
postgresql.git anyway, the other one being the canonical repository,
there doesn't seem to be any point to those lines ... or am I mistaken?

I don't see this being worth the trouble. It would just make it more
difficult to track where the files are coming from. There could also be
problems with downstream distributors if we are not shipping files in
source form.

#22Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#16)
Re: TRUE/FALSE vs true/false

On Thu, Aug 23, 2012 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Aug 20, 2012 at 03:09:08PM -0400, Robert Haas wrote:

I think the thing we need to look at is what percentage of our code
churn is coming from stuff like this, versus what percentage of it is
coming from other factors. If we change 250,000 lines of code per
release cycle and of that this kind of thing accounts for 5,000 lines
of deltas, then IMHO it's not really material. If it accounts for
50,000 lines of deltas out of the same base, that's probably more than
can really be justified by the benefit we're going to get out of it.

The true/false capitalization patch changes 1.2k lines.

I did a quick look at git diff --stat between recent branches:

$ git diff --shortstat REL9_0_9 REL9_1_5
3186 files changed, 314847 insertions(+), 210452 deletions(-)
$ git diff --shortstat REL9_1_5 REL9_2_BETA4
2037 files changed, 290919 insertions(+), 189487 deletions(-)

However, when you look at things a bit closer, these numbers are
misleading because they include the .po files, which seem to have huge
inter-branch churn --- well in excess of 100000 lines changed per
release, at least in git's simpleminded view. Excluding those, as well
as src/test/isolation/expected/prepared-transactions.out which added
34843 lines all by itself, I get
173080 insertions, 70300 deletions for 9.0.9 -> 9.1.5
130706 insertions, 55714 deletions for 9.1.5 -> 9.2beta4.
So it looks like we touch order-of-magnitude of 100K lines per release;
which still seems astonishingly high, but then this includes docs and
regression tests not just code. If I restrict the stat to *.[chyl]
files it's about half that:

$ git diff --numstat REL9_0_9 REL9_1_5 | grep '\.[chyl]$' | awk '{a += $1; b += $2}
END{print a,b}'
90234 33902
$ git diff --numstat REL9_1_5 REL9_2_BETA4 | grep '\.[chyl]$' | awk '{a += $1; b += $2}
END{print a,b}'
90200 42218

So a patch of 1K lines would by itself represent about 2% of the typical
inter-branch delta. Maybe that's below our threshold of pain, or maybe
it isn't. I'd be happier about it if there were a more compelling
argument for it, but to me it looks like extremely trivial neatnik-ism.

I wouldn't mind a bit if we devoted 2% of our inter-branch deltas to
this sort of thing, but I've got to admit that 2% for one patch seems
a little on the high side to me. It might not be a bad idea to
establish one formulation or the other as the one to be used in all
new code, though, to avoid making the problem worse.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#23Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#22)
Re: TRUE/FALSE vs true/false

On Sat, Aug 25, 2012 at 12:26:30PM -0400, Robert Haas wrote:

So a patch of 1K lines would by itself represent about 2% of the typical
inter-branch delta. Maybe that's below our threshold of pain, or maybe
it isn't. I'd be happier about it if there were a more compelling
argument for it, but to me it looks like extremely trivial neatnik-ism.

I wouldn't mind a bit if we devoted 2% of our inter-branch deltas to
this sort of thing, but I've got to admit that 2% for one patch seems
a little on the high side to me. It might not be a bad idea to
establish one formulation or the other as the one to be used in all
new code, though, to avoid making the problem worse.

Patch withdrawn. If we ever do a major code churn, it might be good to
revisit this cleanup.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +