A little report on informal commit tag usage

Started by Thomas Munroover 6 years ago11 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

On Fri, Jul 12, 2019 at 1:25 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Jul 11, 2019 at 09:44:07AM -0400, Tom Lane wrote:

I thought we *did* have an agreement, to wit using

Discussion: /messages/by-id/&lt;message-id&gt;

to link to relevant mail thread(s). Some people use more tags
but that seems inessential to me.

Hehe. I actually was thinking about advocating for having more of
them in the commit logs. I'll just start a new thread about what I
had in mind. Perhaps that will lead us nowhere, but let's see.

[Moving to -hackers]

Here are the tags that people have used in the past year, in commit messages:

763 Author
9 Authors
144 Backpatch-through
55 Backpatch
14 Bug
14 Co-authored-by
27 Diagnosed-By
1593 Discussion
42 Doc
284 Reported-By
5 Review
8 Reviewed by
456 Reviewed-By
7 Security
9 Tested-By

Other things I've noticed:

* a few people list authors and reviewers in prose in a fairly
mechanical paragraph
* some people put back-patch and bug number information in prose
* a few people list authors and reviewers with full email addresses
* some people repeat tags for multiple values, others make comma separated lists
* some people break long lines of meta-data with newlines
* authors "X and Y" may be an alternative to "X, Y", or imply greater
collaboration

The counts above were produced by case-insensitively sorting and
counting unique stuff that precedes a colon, and then throwing out
those used fewer than three times (these are false matches and typos),
and then throwing out a couple of obvious false matches by hand.
Starting from here:

git log --since 2018-07-14 | \
grep -E '^ *[A-Z].*: ' | \
sort -i | \
sed 's/:.*//' | \
uniq -ic | \
grep -v -E '^ *[12] '

--
Thomas Munro
https://enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#1)
Re: A little report on informal commit tag usage

Thomas Munro <thomas.munro@gmail.com> writes:

Here are the tags that people have used in the past year, in commit messages:

763 Author
9 Authors
144 Backpatch-through
55 Backpatch
14 Bug
14 Co-authored-by
27 Diagnosed-By
1593 Discussion
42 Doc
284 Reported-By
5 Review
8 Reviewed by
456 Reviewed-By
7 Security
9 Tested-By

One small comment on that --- I'm not sure what you meant to count
in respect to the "Doc" item, but I believe there's a fairly widespread
convention to write "doc:" or some variant in the initial summary line
of commits that touch only documentation. The point here is to let
release-note writers quickly ignore such commits, since we never list
them as release note items. Bruce and I, being the usual suspects for
release-note writing, are pretty religious about this but other people
do it too. I see a lot more than 42 such commit messages in the past
year, so not sure what you were counting?

Anyway, that's not a "tag" in the sense I understand you to be using
(otherwise the entries would look something like "Doc: yes" and be at
the end, which is unhelpful for the purpose). But it's a related sort
of commit-message convention.

regards, tom lane

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#2)
Re: A little report on informal commit tag usage

On Mon, Jul 15, 2019 at 5:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

42 Doc

[...] I see a lot more than 42 such commit messages in the past
year, so not sure what you were counting?

I would have tried to exclude the first line messages if I'd thought
of that. But anyway, the reason for the low Doc number is case
sensitivity. I ran that on a Mac and its lame collation support failed
me in the "sort" step (also -i didn't do what I wanted, but that
wasn't the issue). Trying again on FreeBSD box and explicitly setting
LANG for the benefit of anyone else wanting to run this (see end), and
then removing a few obvious false matches, I now get similar numbers
in most fields but a higher "doc" number:

767 Author
9 Authors
144 Backpatch-through
55 Backpatch
14 Bug
14 Co-authored-by
27 Diagnosed-by
1599 Discussion
119 doc
36 docs
284 Reported-by
5 Review
8 Reviewed by
460 Reviewed-by
7 Security
9 Tested-by

git log --since 2018-07-14 | \
grep -E '^ +[a-zA-Z].*: ' | \
LANG=en_US.UTF-8 sort | \
sed 's/:.*//' | \
LANG=en_US.UTF-8 uniq -ic | \
grep -v -E '^ *[12] '

--
Thomas Munro
https://enterprisedb.com

#4Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#3)
Re: A little report on informal commit tag usage

On Mon, Jul 15, 2019 at 05:49:26PM +1200, Thomas Munro wrote:

I would have tried to exclude the first line messages if I'd thought
of that. But anyway, the reason for the low Doc number is case
sensitivity. I ran that on a Mac and its lame collation support failed
me in the "sort" step (also -i didn't do what I wanted, but that
wasn't the issue). Trying again on FreeBSD box and explicitly setting
LANG for the benefit of anyone else wanting to run this (see end), and
then removing a few obvious false matches, I now get similar numbers
in most fields but a higher "doc" number:

767 Author
9 Authors
144 Backpatch-through
55 Backpatch
14 Bug
14 Co-authored-by
27 Diagnosed-by
1599 Discussion
119 doc
36 docs
284 Reported-by
5 Review
8 Reviewed by
460 Reviewed-by
7 Security
9 Tested-by

Thanks for those numbers. I am wondering if we could do a bit of
consolidation here and write a page about this stuff on the wiki.
Getting the "Discussion" field most of the time is really cool.

I think that we could get some improvements on the following things.
Here is a set of ideas:
- Avoid "Authors" and replace it with "Author" even if there are
multiple authors.
- Avoid having multiple entries for each one of them? For example we
have a couple of commits listed listing one "Reviewed-by" field with
one single name.
- Most commit entries to not use the email address with the name of
the author, reviewer, tester or reporter. Perhaps we should give up
on that?
- Keep "Backpatch-through", not "Backpatch".
- Keep "Reviewed-by", not "Reviewed by" nor "Review".

"Security" is a special case, we append it to all the CVE-related
commits.

That is mainly a matter of taste, but I tend to prefer the following
format, protecting usually the order:
- Diagnosed-by
- Author
- Reviewed-by
- Discussion
- Backpatch-through
- I tend to have only one "Reviewed-by" entry with a list of names,
same for "Author" and "Reported-by".
- Only names, no emails.

As mentioned on different threads, "Discussion" is the only one we had
a strong agreement with. Could it be possible to consider things like
Author, Reported-by, Reviewed-by or Backpatch-through for example and
extend to that? The first three ones are useful for parsing the
commit logs. The fourth one is handy so as there is no need to look
at a full log tree with git log --graph or such, which is something I
do from time to time to guess down to where a fix has been applied (I
tend to avoid git_changelog).
--
Michael

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#4)
Re: A little report on informal commit tag usage

Michael Paquier <michael@paquier.xyz> writes:

As mentioned on different threads, "Discussion" is the only one we had
a strong agreement with. Could it be possible to consider things like
Author, Reported-by, Reviewed-by or Backpatch-through for example and
extend to that? The first three ones are useful for parsing the
commit logs. The fourth one is handy so as there is no need to look
at a full log tree with git log --graph or such, which is something I
do from time to time to guess down to where a fix has been applied (I
tend to avoid git_changelog).

FWIW, I'm one of the people who prefer prose for this. The backpatching
bit is a good example of why, because my log messages typically don't
just say "backpatch to 9.6" but something about why that was the cutoff.
For instance in 0ec3e13c6,

Per gripe from Ken Tanzer. Back-patch to 9.6. The issue exists
further back, but before 9.6 the code looks very different and it
doesn't actually know whether the "var" name matches anything,
so I desisted from trying to fix it.

I am in favor of trying to consistently mention that a patch is being
back-patched, rather than expecting people to rely on git metadata
to find that out. But I don't see that a rigid "Backpatch" tag format
makes anything easier there. If you need to know that mechanically,
git_changelog is way more reliable.

I'm also skeptical of the argument that machine-parseable Reported-by
and so forth are useful to anybody. Who'd use them, and for what?
Also, it's not always clear how to apply such a format to a real
situation --- eg, what do you do if the reporter is also the patch
author, or a co-author? I'm not excited about redundantly entering
somebody's name several times.

regards, tom lane

#6Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#5)
Re: A little report on informal commit tag usage

On 16 Jul 2019, at 16:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael@paquier.xyz> writes:

As mentioned on different threads, "Discussion" is the only one we had
a strong agreement with. Could it be possible to consider things like
Author, Reported-by, Reviewed-by or Backpatch-through for example and
extend to that? The first three ones are useful for parsing the
commit logs. The fourth one is handy so as there is no need to look
at a full log tree with git log --graph or such, which is something I
do from time to time to guess down to where a fix has been applied (I
tend to avoid git_changelog).

FWIW, I'm one of the people who prefer prose for this. The backpatching
bit is a good example of why, because my log messages typically don't
just say "backpatch to 9.6" but something about why that was the cutoff.

Wearing my $work-hat where I regularly perform interesting merges of postgres
releases as an upstream, these detailed commit messages are very valuable and
much appreciated. The wealth of (human readable) information stored in the
commit logs makes tracking postgres as an upstream quite a lot easier.

I'm also skeptical of the argument that machine-parseable Reported-by
and so forth are useful to anybody. Who'd use them, and for what?

The green gamification dot on people’s Github profiles might light up if the
machine readable format with email address was used (and the user has that
specific email connected to their Github account unless it’s a primary email).
Looking at commit 1c9bb02d8ec1d5b1b319e4fed70439a403c245b1 I can see that for
August 2018 Amit’s Github profile lists “Created 1 commit in 1 repository
postgres/postgres 1 commit”, which is likely from this commit message being
parsed in the mirror.

cheers ./daniel

#7Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#5)
Re: A little report on informal commit tag usage

Hi,

On 2019-07-16 10:33:06 -0400, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

As mentioned on different threads, "Discussion" is the only one we had
a strong agreement with. Could it be possible to consider things like
Author, Reported-by, Reviewed-by or Backpatch-through for example and
extend to that? The first three ones are useful for parsing the
commit logs. The fourth one is handy so as there is no need to look
at a full log tree with git log --graph or such, which is something I
do from time to time to guess down to where a fix has been applied (I
tend to avoid git_changelog).

FWIW, I'm one of the people who prefer prose for this. The backpatching
bit is a good example of why, because my log messages typically don't
just say "backpatch to 9.6" but something about why that was the cutoff.

They don't preclude each other though. E.g. it'd be sensible to have both

Per gripe from Ken Tanzer. Back-patch to 9.6. The issue exists
further back, but before 9.6 the code looks very different and it
doesn't actually know whether the "var" name matches anything,
so I desisted from trying to fix it.

and "Backpatch: 9.6-" or such.

I am in favor of trying to consistently mention that a patch is being
back-patched, rather than expecting people to rely on git metadata
to find that out. But I don't see that a rigid "Backpatch" tag format
makes anything easier there. If you need to know that mechanically,
git_changelog is way more reliable.

I find it useful to have a quick place to scan in a commit message. It's
a lot quicker to focus on the last few lines with tags, and see a
'Backpatch: 9.6-' than to parse a potentially long commit message. If
I'm then still interested in the commit, I'll then read the commit.

Greetings,

Andres Freund

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#7)
Re: A little report on informal commit tag usage

Andres Freund <andres@anarazel.de> writes:

They don't preclude each other though. E.g. it'd be sensible to have both

Per gripe from Ken Tanzer. Back-patch to 9.6. The issue exists
further back, but before 9.6 the code looks very different and it
doesn't actually know whether the "var" name matches anything,
so I desisted from trying to fix it.

and "Backpatch: 9.6-" or such.

I've wondered for some time what you think the "-" means in this.

regards, tom lane

#9Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#8)
Re: A little report on informal commit tag usage

Hi,

On 2019-07-16 19:26:59 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

They don't preclude each other though. E.g. it'd be sensible to have both

Per gripe from Ken Tanzer. Back-patch to 9.6. The issue exists
further back, but before 9.6 the code looks very different and it
doesn't actually know whether the "var" name matches anything,
so I desisted from trying to fix it.

and "Backpatch: 9.6-" or such.

I've wondered for some time what you think the "-" means in this.

Up to master. Occasionally there's bugs that only need to be fixed in
some back branches etc.

Greetings,

Andres Freund

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Daniel Gustafsson (#6)
Re: A little report on informal commit tag usage

On 2019-Jul-16, Daniel Gustafsson wrote:

The green gamification dot on people’s Github profiles might light up if the
machine readable format with email address was used (and the user has that
specific email connected to their Github account unless it’s a primary email).
Looking at commit 1c9bb02d8ec1d5b1b319e4fed70439a403c245b1 I can see that for
August 2018 Amit’s Github profile lists “Created 1 commit in 1 repository
postgres/postgres 1 commit”, which is likely from this commit message being
parsed in the mirror.

I specifically use "co-authored-by" (and scanning the grep results, I'm
the only person doing it) because github recognizes it in this way.
However I only feel entitled to use it when the patch has been developed
by me plus some other person(s), which has a bit of a contradictory result:
when I don't touch some submitted patch, I use "Author" since I (the
committer) am not a co-author. That means github attributes such
patches solely to me :-(

I realize now, however, that in order for this to work I have to include
the email address, not just the name. I failed to do that at least
once.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#9)
Re: A little report on informal commit tag usage

On Tue, Jul 16, 2019 at 04:33:07PM -0700, Andres Freund wrote:

On 2019-07-16 19:26:59 -0400, Tom Lane wrote:

I've wondered for some time what you think the "-" means in this.

Up to master. Occasionally there's bugs that only need to be fixed in
some back branches etc.

Is "-" most common to define a range of branches? I would have
imagined that "~" makes more sense here.
--
Michael