Maintaining a list of pgindent commits for "git blame" to ignore
Recent versions of git are capable of maintaining a list of commits
for "git blame" to ignore:
https://www.moxio.com/blog/43/ignoring-bulk-change-commits-with-git-blame
I tried this out myself, using my own list of pgindent commits. It
works very well -- much better than what you get when you ask git to
heuristically ignore commits based on whitespace-only line changes.
This is not surprising, in a way; I don't actually want to avoid
whitespace. I just want to ignore pgindent commits.
Note that there are still a small number of pgindent line changes,
even with this. That's because sometimes it's unavoidable -- some
"substantively distinct lines of code" are actually created by
pgindent. But these all seem to be <CR> lines that are only shown
because there is legitimately no more appropriate commit to attribute
the line to. This seems like the ideal behavior to me.
I propose that we (I suppose I actually mean Bruce) start maintaining
our own file for this in git. It can be configured to run without any
extra steps via a once-off "git config blame.ignoreRevsFile
.git-blame-ignore-revs". It would only need to be updated whenever
Bruce or Tom runs pgindent.
It doesn't matter if this misses one or two smaller pgindent runs, it
seems. Provided the really huge ones are in the file, everything works
very well.
--
Peter Geoghegan
On Thu, Mar 18, 2021 at 01:46:49PM -0700, Peter Geoghegan wrote:
Recent versions of git are capable of maintaining a list of commits
for "git blame" to ignore:https://www.moxio.com/blog/43/ignoring-bulk-change-commits-with-git-blame
I tried this out myself, using my own list of pgindent commits. It
works very well -- much better than what you get when you ask git to
heuristically ignore commits based on whitespace-only line changes.
This is not surprising, in a way; I don't actually want to avoid
whitespace. I just want to ignore pgindent commits.Note that there are still a small number of pgindent line changes,
even with this. That's because sometimes it's unavoidable -- some
"substantively distinct lines of code" are actually created by
pgindent. But these all seem to be <CR> lines that are only shown
because there is legitimately no more appropriate commit to attribute
the line to. This seems like the ideal behavior to me.I propose that we (I suppose I actually mean Bruce) start maintaining
Actually, Tom Lane runs pgindent usually now. I do the copyright
change, but I don't think we would ignore those since the single-line
change is probably something we would want to blame.
our own file for this in git. It can be configured to run without any
extra steps via a once-off "git config blame.ignoreRevsFile
.git-blame-ignore-revs". It would only need to be updated whenever
Bruce or Tom runs pgindent.It doesn't matter if this misses one or two smaller pgindent runs, it
seems. Provided the really huge ones are in the file, everything works
very well.
It would certainly be very easy to pull pgindent commits out of git log
and add them. I do wish we could cause everyone to honor that file, but
it seems each user has to configure their repository to honor it.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On Thu, Mar 18, 2021 at 2:10 PM Bruce Momjian <bruce@momjian.us> wrote:
Actually, Tom Lane runs pgindent usually now. I do the copyright
change, but I don't think we would ignore those since the single-line
change is probably something we would want to blame.
The copyright change commits don't need to be considered here. In
practice they're just not a problem because nobody wants or expects
"git blame" to do anything more than attribute an affected line to a
copyright commit.
It would certainly be very easy to pull pgindent commits out of git log
and add them. I do wish we could cause everyone to honor that file, but
it seems each user has to configure their repository to honor it.
That doesn't seem like a huge problem. There is no reason why this
shouldn't be easy to use and to maintain going forward. There just
aren't very many commits involved.
Attached is my .git-blame-ignore-revs file, which has pgindent commits
that I'd like to exclude from git blame. The file is helpful on its
own. But what we really ought to do is commit the file (perhaps with
some revisions) and require that it be maintained by the official
project workflow documented at src/tools/pgindent/README.
--
Peter Geoghegan
Attachments:
On Thu, Mar 18, 2021 at 03:03:41PM -0700, Peter Geoghegan wrote:
Attached is my .git-blame-ignore-revs file, which has pgindent commits
that I'd like to exclude from git blame. The file is helpful on its
own. But what we really ought to do is commit the file (perhaps with
some revisions) and require that it be maintained by the official
project workflow documented at src/tools/pgindent/README.
It would be kind of nice if the file can be generated automatically. I
have you checked if 'pgindent' being on the first line of the commit is
sufficient?
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On Thu, Mar 18, 2021 at 3:12 PM Bruce Momjian <bruce@momjian.us> wrote:
It would be kind of nice if the file can be generated automatically. I
have you checked if 'pgindent' being on the first line of the commit is
sufficient?
I generated the file by looking for commits that:
1) Mentioned "pgindent" or "PGINDENT" in the entire commit message.
2) Had more than 20 or 30 files changed.
This left me with fewer than 50 commits that cover over 20 years of
history since the first pgindent commit. I also added one or two
others that I somehow missed (maybe you happened to spell it "pg
indent" that year) through trial and error. The file that I sent to
the list works really well for me.
I don't think that it's a good idea to automate this process, because
we certainly don't want to let incorrect entries slip in. And because
there just isn't a lot left to automate -- running pgindent on the
tree is something that happens no more than 2 or 3 times a year. It
could easily be added to the checklist in the README. It should take
less than 5 minutes a year.
--
Peter Geoghegan
On Thu, Mar 18, 2021 at 03:20:37PM -0700, Peter Geoghegan wrote:
On Thu, Mar 18, 2021 at 3:12 PM Bruce Momjian <bruce@momjian.us> wrote:
It would be kind of nice if the file can be generated automatically. I
have you checked if 'pgindent' being on the first line of the commit is
sufficient?I generated the file by looking for commits that:
1) Mentioned "pgindent" or "PGINDENT" in the entire commit message.
2) Had more than 20 or 30 files changed.
This left me with fewer than 50 commits that cover over 20 years of
history since the first pgindent commit. I also added one or two
others that I somehow missed (maybe you happened to spell it "pg
indent" that year) through trial and error. The file that I sent to
the list works really well for me.I don't think that it's a good idea to automate this process, because
we certainly don't want to let incorrect entries slip in. And because
there just isn't a lot left to automate -- running pgindent on the
tree is something that happens no more than 2 or 3 times a year. It
could easily be added to the checklist in the README. It should take
less than 5 minutes a year.
Sounds like a plan. We should mention adding to this file somewhere in
our pgindent README.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
Peter Geoghegan <pg@bowt.ie> writes:
Attached is my .git-blame-ignore-revs file, which has pgindent commits
that I'd like to exclude from git blame. The file is helpful on its
own. But what we really ought to do is commit the file (perhaps with
some revisions) and require that it be maintained by the official
project workflow documented at src/tools/pgindent/README.
I don't object to maintaining such a file; if it makes "git blame"
work better, that's a huge win. However, the file as you have it
seems rather unreadable. I'd much rather have something that includes
the commit date and/or first line of commit message. Is there any
flexibility in the format, or does git blame insist it be just like this?
regards, tom lane
On Thu, Mar 18, 2021 at 3:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't object to maintaining such a file; if it makes "git blame"
work better, that's a huge win. However, the file as you have it
seems rather unreadable. I'd much rather have something that includes
the commit date and/or first line of commit message. Is there any
flexibility in the format, or does git blame insist it be just like this?
I ended up doing it that way because I was in a hurry to see how much
it helped. I can fix it up.
We could require (but not automatically enforce) that the first line
of the commit message be included above each hash, as a comment. You
could also require reverse chronological ordering of commits. That
would make everything easy to follow.
It's worth noting that git insists that you provide the full hash of
commits here. This is not something I remember it insisting upon in
any other area. There is probably a very good practical reason for
that.
--
Peter Geoghegan
On Thu, Mar 18, 2021 at 03:46:10PM -0700, Peter Geoghegan wrote:
It's worth noting that git insists that you provide the full hash of
commits here. This is not something I remember it insisting upon in
any other area. There is probably a very good practical reason for
that.
Probably because later commits might collide with shorter hashes. When
you are reporting a hash that only looks _backward_, this is not an
issue.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On Thu, Mar 18, 2021 at 4:00 PM Bruce Momjian <bruce@momjian.us> wrote:
Probably because later commits might collide with shorter hashes. When
you are reporting a hash that only looks _backward_, this is not an
issue.
Right, but it's extremely unlikely to happen by accident. I was
suggesting that there might be a security issue. I could fairly easily
make my git commit match a prefix intended to uniquely identify your
git commit if I set out to do so.
There are projects that might have to consider that possibility,
though perhaps we're not one of them.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
On Thu, Mar 18, 2021 at 3:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't object to maintaining such a file; if it makes "git blame"
work better, that's a huge win. However, the file as you have it
seems rather unreadable. I'd much rather have something that includes
the commit date and/or first line of commit message. Is there any
flexibility in the format, or does git blame insist it be just like this?
I ended up doing it that way because I was in a hurry to see how much
it helped. I can fix it up.
We could require (but not automatically enforce) that the first line
of the commit message be included above each hash, as a comment. You
could also require reverse chronological ordering of commits. That
would make everything easy to follow.
Given that the file will be added to manually, I think just having an
existing format to follow will be easy enough. I'd suggest something
like
b5d69b7c22ee4c44b8bb99cfa0466ffaf3b5fab9 # Sun Jun 7 16:57:08 2020 -0400
# pgindent run prior to branching v13.
which is easy to make from "git log" or "git show" output. (Because
of the precedent of those tools, I'd rather write the commit hash
before the rest of the entry.)
The date is important IMO because otherwise it's quite unclear whether
to add a new entry at the top or the bottom.
Other points: the file should be kept in src/tools/pgindent/, and
it should definitely NOT have a name beginning with ".".
regards, tom lane
On Thu, Mar 18, 2021 at 07:05:03PM -0400, Tom Lane wrote:
Other points: the file should be kept in src/tools/pgindent/, and
it should definitely NOT have a name beginning with ".".
Well, if we want github and others to eventually honor a file, it seems
.git-blame-ignore-revs at the top of the tree is the common location for
this. Of course, I don't know if they will ever do that, and can move
it later if needed.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On Thu, Mar 18, 2021 at 4:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
b5d69b7c22ee4c44b8bb99cfa0466ffaf3b5fab9 # Sun Jun 7 16:57:08 2020 -0400
# pgindent run prior to branching v13.which is easy to make from "git log" or "git show" output. (Because
of the precedent of those tools, I'd rather write the commit hash
before the rest of the entry.)
WFM.
What about reformat-dat-files and perltidy runs? It seems that you
have sometimes used all three reformatting tools to produce one commit
-- but not always. ISTM that I should get any of those that I missed.
And that the pgindent README (which already mentions these other
tools) ought to be updated to be explicit about the policy applying
equally to commits that apply any of the two other tools in bulk.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
What about reformat-dat-files and perltidy runs? It seems that you
have sometimes used all three reformatting tools to produce one commit
-- but not always. ISTM that I should get any of those that I missed.
And that the pgindent README (which already mentions these other
tools) ought to be updated to be explicit about the policy applying
equally to commits that apply any of the two other tools in bulk.
Good question. We don't have a standard about that (whether to
do those in separate or the same commits), but we could establish one
if it seems helpful.
regards, tom lane
On Thu, Mar 18, 2021 at 4:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Good question. We don't have a standard about that (whether to
do those in separate or the same commits), but we could establish one
if it seems helpful.
I don't think that it matters too much, but it will necessitate
updating the file multiple times. It might become natural to just do
everything together in a way that it wasn't before.
The really big wins come from excluding the enormous pgindent run
commits, especially for the few historic pgindent runs where the rules
changed -- there are no more than a handful of those. They tend to
generate an enormous amount of churn that touches almost everything.
So it probably isn't necessary to worry about smaller things.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
On Thu, Mar 18, 2021 at 4:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Good question. We don't have a standard about that (whether to
do those in separate or the same commits), but we could establish one
if it seems helpful.
I don't think that it matters too much, but it will necessitate
updating the file multiple times. It might become natural to just do
everything together in a way that it wasn't before.
Doubt that it matters. The workflow would have to be "commit and push
the mechanical updates, then edit the tracking file, commit and push
that". You don't have the commit hash nailed down till you've pushed.
So if we decided to do the mechanical updates in several commits,
not just one, I'd still be inclined to do them all and then edit the
tracking file once.
regards, tom lane
On Thu, Mar 18, 2021 at 5:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Doubt that it matters. The workflow would have to be "commit and push
the mechanical updates, then edit the tracking file, commit and push
that". You don't have the commit hash nailed down till you've pushed.
Okay. I have made a personal TODO list item for this. I'll pick this
up again in April, once the final CF is over.
--
Peter Geoghegan
On Thu, Mar 18, 2021 at 4:32 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Mar 18, 2021 at 4:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
b5d69b7c22ee4c44b8bb99cfa0466ffaf3b5fab9 # Sun Jun 7 16:57:08 2020 -0400
# pgindent run prior to branching v13.which is easy to make from "git log" or "git show" output. (Because
of the precedent of those tools, I'd rather write the commit hash
before the rest of the entry.)WFM.
What do you think of the attached? I prefer the ISO date style myself,
so I went with that.
Note that I have included "Modify BufferGetPage() to prepare for
"snapshot too old" feature", as well as "Revert no-op changes to
BufferGetPage()". I've noticed that those two particular commits cause
unhelpful noise when I run "git blame" on access method code. I see
problems with these commits often enough to matter. The latter commit
cleanly reverted the former after only 12 days, so ignoring both seems
okay to me.
Everything else should be either pgindent/perltidy related or
reformat-dat-files related.
--
Peter Geoghegan
Attachments:
Peter Geoghegan <pg@bowt.ie> writes:
What do you think of the attached? I prefer the ISO date style myself,
so I went with that.
I grepped the git logs for "indent" and found a bunch more commits
that look like they should be included:
db6e2b4c5
d84213909
1e9c85809
f04d4ac91
9ef2dbefc
651902deb
ce5548103
b5bce6c1e
de94e2af1
d0cd7bda9
befa3e648
7584649a1
84288a86a
d74714027
b6b71b85b
46785776c
089003fb4
ea08e6cd5
59f6a57e5
It's possible that some of these touch few enough lines that they
are not worth listing; I did not check the commit delta sizes.
Note that I have included "Modify BufferGetPage() to prepare for
"snapshot too old" feature", as well as "Revert no-op changes to
BufferGetPage()". I've noticed that those two particular commits cause
unhelpful noise when I run "git blame" on access method code.
Meh. I can get on board with the idea of adding commit+revert pairs
to this list, but I'd like a more principled selection filter than
"which ones bother Peter". Maybe the size of the reverted patch
should factor into it.
Do we have an idea of how much adding entries to this list slows
down "git blame"? If we include commit+revert pairs more than
very sparingly, I'm afraid we'll soon have an enormous list, and
I wonder what that will cost us.
regards, tom lane
On Mon, Jun 21, 2021 at 8:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It's possible that some of these touch few enough lines that they
are not worth listing; I did not check the commit delta sizes.
Commits that touch very few lines weren't included originally, just
because it didn't seem necessary. Even still, I've looked through the
extra commits now, and everything that you picked out looks in scope.
I'm just going to include these extra commits.
Attached is a new version of the same file, based on your feedback
(with those extra commits, and some commits from the last version
removed). I'll produce a conventional patch file in the next revision,
most likely.
Meh. I can get on board with the idea of adding commit+revert pairs
to this list, but I'd like a more principled selection filter than
"which ones bother Peter". Maybe the size of the reverted patch
should factor into it
I have to admit that you're right. That was why I picked those two
out. Of course I can defend this choice in detail, but in the interest
of not setting a terrible precedent I won't do that. The commits in
question have been removed from this revised version.
I think it's important that we not get into the business of adding
stuff to this willy-nilly. Inevitably everybody will have their own
pet peeve noisy commit, and will want to add to the list -- just like
I did. Naturally nobody will be interested in arguing against
including whatever individual pet peeve commit each time this comes
up. Regardless of the merits of the case. Let's just not go there
(unless perhaps it's truly awful for almost everybody).
Do we have an idea of how much adding entries to this list slows
down "git blame"? If we include commit+revert pairs more than
very sparingly, I'm afraid we'll soon have an enormous list, and
I wonder what that will cost us.
I doubt it costs us much, at least in any way that has a very
noticeable relationship as new commits are added. I've now settled on
68 commits, and expect that this won't need to grow very quickly, so
that seems fine. From my point of view it makes "git blame" far more
useful.
LLVM uses a file with fewer entries, and have had such a file since last year:
https://github.com/llvm/llvm-project/blob/main/.git-blame-ignore-revs
The list of commit hashes in the file that the Blender project uses is
about the same size:
https://github.com/blender/blender/blob/master/.git-blame-ignore-revs
We're using more commits than either project here, but it's within an
order of magnitude. Note that this is going to be opt-in, not opt-out.
It won't do anything unless an individual hacker decides to enable it
locally.
The convention seems to be that it is located in the top-level
directory. ISTM that we should follow that convention, since following
the convention is good, and does not in itself force anybody to ignore
any of the listed commits. Any thoughts on that?
--
Peter Geoghegan