run pgindent on a regular basis / scripted manner
Hi,
When developing patches I find it fairly painful that I cannot re-indent
patches with pgindent without also seeing a lot of indentation changes
in unmodified parts of files. It is easy enough ([1]./src/tools/pgindent/pgindent $(git diff-tree --no-commit-id --name-only -r upstream/master..HEAD|grep -v src/test|grep -v READ ME|grep -v typedefs.list)) to only re-indent
files that I have modified, but there's often a lot of independent
indentation changes in the files that I did modified.
I e.g. just re-indented patch 0001 of my GetSnapshotData() series and
most of the hunks were entirely unrelated. Despite the development
window for 14 having only relatively recently opened. Based on my
experience it tends to get worse over time.
Is there any reason we don't just automatically run pgindent regularly?
Like once a week? And also update typedefs.list automatically, while
we're at it?
Currently the yearly pgindent runs are somewhat painful for patches that
didn't make it into the release, but if we were to reindent on a more
regular basis, that should be much less the case. It'd also help newer
developers who we sometimes tell to use pgindent - which doesn't really
work.
Greetings,
Andres Freund
[1]: ./src/tools/pgindent/pgindent $(git diff-tree --no-commit-id --name-only -r upstream/master..HEAD|grep -v src/test|grep -v READ ME|grep -v typedefs.list)
On 2020-Aug-12, Andres Freund wrote:
Is there any reason we don't just automatically run pgindent regularly?
Like once a week? And also update typedefs.list automatically, while
we're at it?
Seconded.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Andres,
On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote:
Hi,
When developing patches I find it fairly painful that I cannot re-indent
patches with pgindent without also seeing a lot of indentation changes
in unmodified parts of files. It is easy enough ([1]) to only re-indent
files that I have modified, but there's often a lot of independent
indentation changes in the files that I did modified.I e.g. just re-indented patch 0001 of my GetSnapshotData() series and
most of the hunks were entirely unrelated. Despite the development
window for 14 having only relatively recently opened. Based on my
experience it tends to get worse over time.
How bad was it right after branching 13? I wonder if we have any
empirical measure of badness over time -- assuming there was a point in
the recent past where everything was good, and the bad just crept in.
Is there any reason we don't just automatically run pgindent regularly?
Like once a week? And also update typedefs.list automatically, while
we're at it?
You know what's better than weekly? Every check-in. I for one would love
it if we can just format the entire codebase, and ensure that new
check-ins are also formatted. We _do_ need some form of continuous
integration to catch us when we have fallen short (again, once HEAD
reaches a "known good" state, it's conceivably cheap to keep it in the
good state.
Cheers,
Jesse
Hi,
On 2020-08-12 16:08:50 -0700, Jesse Zhang wrote:
On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote:
Hi,
When developing patches I find it fairly painful that I cannot re-indent
patches with pgindent without also seeing a lot of indentation changes
in unmodified parts of files. It is easy enough ([1]) to only re-indent
files that I have modified, but there's often a lot of independent
indentation changes in the files that I did modified.I e.g. just re-indented patch 0001 of my GetSnapshotData() series and
most of the hunks were entirely unrelated. Despite the development
window for 14 having only relatively recently opened. Based on my
experience it tends to get worse over time.How bad was it right after branching 13? I wonder if we have any
empirical measure of badness over time -- assuming there was a point in
the recent past where everything was good, and the bad just crept in.
Well, just after branching it was perfect, because pgindent was
customarily is run just before branching. After that it incrementally
gets worse.
Is there any reason we don't just automatically run pgindent regularly?
Like once a week? And also update typedefs.list automatically, while
we're at it?You know what's better than weekly? Every check-in. I for one would love
it if we can just format the entire codebase, and ensure that new
check-ins are also formatted. We _do_ need some form of continuous
integration to catch us when we have fallen short (again, once HEAD
reaches a "known good" state, it's conceivably cheap to keep it in the
good state.
Unfortunately that is, with the current tooling, not entirely trivial to
do so completely. The way we generate the list of known typedefs
unfortunately depends on the platform a build is run on. Therefore the
buildfarm collects a number of the generated list of typedefs from
different platforms, and then we use that combined list to run pgindent.
We surely can improve further, but I think having any automation around
this already would be a huge step.
Greetings,
Andres Freund
Jesse Zhang <sbjesse@gmail.com> writes:
On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote:
Is there any reason we don't just automatically run pgindent regularly?
Like once a week? And also update typedefs.list automatically, while
we're at it?
You know what's better than weekly? Every check-in.
I'm not in favor of unsupervised pgindent runs, really. It can do a lot
of damage to code that was written without thinking about it --- in
particular, it'll make a hash of comment blocks that were manually
formatted and not protected with dashes.
If the workflow is commit first and re-indent later, then we can always
revert the pgindent commit and clean things up manually; but an auto
re-indent during commit wouldn't provide that history.
I do like the idea of more frequent, smaller pgindent runs instead of
doing just one giant run per year. With the giant runs it's necessary
to invest a fair amount of time eyeballing all the changes; if we did it
every couple weeks then the pain would be a lot less.
Another idea would be to have a bot that runs pgindent *without*
committing the results, and emails the people who have made commits
into files that changed, saying "if you don't like these diffs then
you'd better clean up your commit before it happens for real". With
some warning like that, it might be okay to do automatic reindenting
a little bit later. Plus, nagging committers who habitually commit
improperly-indented code might persuade them to clean up their acts ;-)
regards, tom lane
Andres Freund <andres@anarazel.de> writes:
Unfortunately that is, with the current tooling, not entirely trivial to
do so completely. The way we generate the list of known typedefs
unfortunately depends on the platform a build is run on. Therefore the
buildfarm collects a number of the generated list of typedefs from
different platforms, and then we use that combined list to run pgindent.
Yeah, it's hard to see how to avoid that given that the set of typedefs
provided by system headers is not fixed. (Windows vs not Windows is the
worst case of course, but Unixen aren't all alike either.)
Another gotcha that we have to keep our eyes on is that sometimes the
process finds spurious names that we don't want to treat as typedefs
because it results in messing up too much code. There's a reject list
in pgindent that takes care of those, but it has to be maintained
manually. So I'm not sure how that could work in conjunction with
unsupervised reindents --- by the time you notice a problem, the git
history will already be cluttered with bogus reindentations.
Maybe the secret is to not allow automated adoption of new typedefs.list
entries, but to require somebody to add entries to that file by hand,
even if they're basing it on the buildfarm results. (This would
encourage the habit some people have adopted of updating typedefs.list
along with commits that add typedefs. I've never done that, but would
be willing to change if there's good motivation to.)
regards, tom lane
On Wed, Aug 12, 2020 at 06:53:25PM -0400, Alvaro Herrera wrote:
On 2020-Aug-12, Andres Freund wrote:
Is there any reason we don't just automatically run pgindent regularly?
Like once a week? And also update typedefs.list automatically, while
we're at it?Seconded.
Thirded.
--
Michael
On Wed, Aug 12, 2020 at 07:47:01PM -0400, Tom Lane wrote:
Jesse Zhang <sbjesse@gmail.com> writes:
On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote:
Is there any reason we don't just automatically run pgindent regularly?
Like once a week? And also update typedefs.list automatically, while
we're at it?You know what's better than weekly? Every check-in.
I'm not in favor of unsupervised pgindent runs, really. It can do a lot
of damage to code that was written without thinking about it --- in
particular, it'll make a hash of comment blocks that were manually
formatted and not protected with dashes.If the workflow is commit first and re-indent later, then we can always
revert the pgindent commit and clean things up manually; but an auto
re-indent during commit wouldn't provide that history.
There are competing implementations of assuring pgindent-cleanliness at every
check-in:
1. After each push, an automated followup commit appears, restoring
pgindent-cleanliness.
2. "git push" results in a commit that melds pgindent changes into what the
committer tried to push.
3. "git push" fails, for the master branch, if the pushed commit disrupts
pgindent-cleanliness.
Were you thinking of (2)? (1) doesn't have the lack-of-history problem, but
it does have the unexpected-damage problem, and it makes gitweb noisier. (3)
has neither problem, and I'd prefer it over (1), (2), or $SUBJECT.
Regarding typedefs.list, I would use the in-tree one, like you discussed here:
Show quoted text
On Wed, Aug 12, 2020 at 07:57:29PM -0400, Tom Lane wrote:
Maybe the secret is to not allow automated adoption of new typedefs.list
entries, but to require somebody to add entries to that file by hand,
even if they're basing it on the buildfarm results. (This would
encourage the habit some people have adopted of updating typedefs.list
along with commits that add typedefs. I've never done that, but would
be willing to change if there's good motivation to.)
Noah Misch <noah@leadboat.com> writes:
On Wed, Aug 12, 2020 at 07:47:01PM -0400, Tom Lane wrote:
If the workflow is commit first and re-indent later, then we can always
revert the pgindent commit and clean things up manually; but an auto
re-indent during commit wouldn't provide that history.
There are competing implementations of assuring pgindent-cleanliness at every
check-in:
1. After each push, an automated followup commit appears, restoring
pgindent-cleanliness.
2. "git push" results in a commit that melds pgindent changes into what the
committer tried to push.
3. "git push" fails, for the master branch, if the pushed commit disrupts
pgindent-cleanliness.
Were you thinking of (2)?
I was objecting to (2). (1) would perhaps work. (3) could be pretty
darn annoying, especially if it blocks a time-critical security patch.
Regarding typedefs.list, I would use the in-tree one, like you discussed here:
Yeah, after thinking about that more, it seems like automated
typedefs.list updates would be far riskier than automated reindent
based on the existing typedefs.list. The latter could at least be
expected not to change code unrelated to the immediate commit.
typedefs.list updates need some amount of adult supervision.
(I'd still vote for nag-mail to the committer whose work got reindented,
in case the bot made things a lot worse.)
I hadn't thought about the angle of HEAD versus back-branch patches,
but that does seem like something to ponder. The back branches don't
have the same pgindent rules necessarily, plus the patch versions
might be different in more than just whitespace. My own habit when
back-patching has been to indent the HEAD patch per-current-rules and
then preserve that layout as much as possible in the back branches,
but I doubt we could get a tool to do that with any reliability.
Of course, there's also the possibility of forcibly reindenting
all the active back branches to current rules. But I think we've
rejected that idea already, because it would cause so much pain
for forks that are following a back branch.
regards, tom lane
On Thu, Aug 13, 2020 at 12:08:36AM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Wed, Aug 12, 2020 at 07:47:01PM -0400, Tom Lane wrote:
If the workflow is commit first and re-indent later, then we can always
revert the pgindent commit and clean things up manually; but an auto
re-indent during commit wouldn't provide that history.There are competing implementations of assuring pgindent-cleanliness at every
check-in:1. After each push, an automated followup commit appears, restoring
pgindent-cleanliness.
2. "git push" results in a commit that melds pgindent changes into what the
committer tried to push.
3. "git push" fails, for the master branch, if the pushed commit disrupts
pgindent-cleanliness.Were you thinking of (2)?
I was objecting to (2). (1) would perhaps work. (3) could be pretty
darn annoying,
Right. I think of three use cases here:
a) I'm a committer who wants to push clean code. I want (3).
b) I'm a committer who wants to ignore pgindent. I get some email complaints
under (1), which I ignore. Under (3), I'm forced to become (a).
c) I'm reading the history. I want (3).
I hadn't thought about the angle of HEAD versus back-branch patches,
but that does seem like something to ponder. The back branches don't
have the same pgindent rules necessarily, plus the patch versions
might be different in more than just whitespace. My own habit when
back-patching has been to indent the HEAD patch per-current-rules and
then preserve that layout as much as possible in the back branches,
but I doubt we could get a tool to do that with any reliability.
Similar habit here. Another advantage of master-only is a guarantee against
disrupting time-critical patches. (It would be ugly to push back branches and
sort out the master push later, but it doesn't obstruct the mission.)
Noah Misch <noah@leadboat.com> writes:
... Another advantage of master-only is a guarantee against
disrupting time-critical patches. (It would be ugly to push back branches and
sort out the master push later, but it doesn't obstruct the mission.)
Hm, doesn't it? I had the idea that "git push" is atomic --- either all
the per-branch commits succeed, or they all fail. I might be wrong.
regards, tom lane
On Thu, Aug 13, 2020 at 01:14:33AM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
... Another advantage of master-only is a guarantee against
disrupting time-critical patches. (It would be ugly to push back branches and
sort out the master push later, but it doesn't obstruct the mission.)Hm, doesn't it? I had the idea that "git push" is atomic --- either all
the per-branch commits succeed, or they all fail. I might be wrong.
Atomicity is good. I just meant that you could issue something like "git push
origin $(cd .git/refs/heads && ls REL*)" to defer the complaint about master.
On Wed, Aug 12, 2020 at 10:14 PM Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
... Another advantage of master-only is a guarantee against
disrupting time-critical patches. (It would be ugly to push back branches and
sort out the master push later, but it doesn't obstruct the mission.)Hm, doesn't it? I had the idea that "git push" is atomic --- either all
the per-branch commits succeed, or they all fail. I might be wrong.
As of Git 2.28, atomic pushes are not turned on by default. That means
"git push my-remote foo bar" _can_ result in partial success. I'm that
paranoid who does "git push --atomic my-remote foo bar fizz".
Cheers,
Jesse
On Thu, Aug 13, 2020 at 6:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Noah Misch <noah@leadboat.com> writes:
On Wed, Aug 12, 2020 at 07:47:01PM -0400, Tom Lane wrote:
If the workflow is commit first and re-indent later, then we can always
revert the pgindent commit and clean things up manually; but an auto
re-indent during commit wouldn't provide that history.There are competing implementations of assuring pgindent-cleanliness at
every
check-in:
1. After each push, an automated followup commit appears, restoring
pgindent-cleanliness.
2. "git push" results in a commit that melds pgindent changes into whatthe
committer tried to push.
3. "git push" fails, for the master branch, if the pushed commit disrupts
pgindent-cleanliness.
There's another option here as well, that is a bit "softer", is to use a
pre-commit hook.
That is, it's a hook that runs on the committers machine prior to the
commit. This hook can then yell "hey, you need to run pgindent before
committing this", but it gives the committer the ability to do --no-verify
and commit anyway (thus won't block things that are urgent).
Since it allows a simple bypass, and very much relies on the committer to
remember to install the hook in their local repository, this is not a
guarantee in any way. So it might need to be done together with something
else in the background doing like a daily job or so, but it might make that
background work be smaller and fewer changes.
This obviously only works in the case where we can rely on the committers
to remember to install such a hook, but given the few committers that we do
have, I think we can certainly get that up to an "acceptable rate" fairly
easily. FWIW, this is similar to what we do in the pgweb, pgeu and a few
other repositories, to ensure python styleguides are followed.
Were you thinking of (2)?
I was objecting to (2). (1) would perhaps work. (3) could be pretty
darn annoying, especially if it blocks a time-critical security patch.
FWIW, I agree that (2) seems like a really bad option. In that suddenly a
committer has committed something they were not aware of.
Regarding typedefs.list, I would use the in-tree one, like you discussed
here:
Yeah, after thinking about that more, it seems like automated
typedefs.list updates would be far riskier than automated reindent
based on the existing typedefs.list. The latter could at least be
expected not to change code unrelated to the immediate commit.
typedefs.list updates need some amount of adult supervision.(I'd still vote for nag-mail to the committer whose work got reindented,
in case the bot made things a lot worse.)
Yeah, I'm definitely not a big fan of automated commits, regardless of if
it's just indent or both indent+typedef. It's happened at least once, and I
think more than once, that we've had to basically hard reset the upstream
repository and clean things up after automated commits have gone bonkers
(hi, Bruce!). Having an automated system do the whole flow of
change->commit->push definitely invites this type of problem.
There are many solutions that do such things but that do it in the "github
workflow" way, which is they do change -> commit -> create pull request,
and then somebody eyeballs that pullrequest and approves it. We don't have
PRs, but we could either have a script that simply sends out a patch to a
mailinglist, or we could have a script that maintains a separate branch
which is auto-pgindented, and then have a committer squash-merge that
branch after having reviewed it.
Maybe something like that in combination with a pre-commit hook per above.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Greetings,
* Magnus Hagander (magnus@hagander.net) wrote:
There's another option here as well, that is a bit "softer", is to use a
pre-commit hook.
Yeah, +1 on a pre-commit idea to help address this. I certainly agree
with Andres that it's quite annoying to deal with commits coming in that
aren't indented properly but are in some file that I'm working on.
This obviously only works in the case where we can rely on the committers
to remember to install such a hook, but given the few committers that we do
have, I think we can certainly get that up to an "acceptable rate" fairly
easily. FWIW, this is similar to what we do in the pgweb, pgeu and a few
other repositories, to ensure python styleguides are followed.
Yeah, no guarantee, but definitely seems like it'd be a good
improvement.
I was objecting to (2). (1) would perhaps work. (3) could be pretty
darn annoying, especially if it blocks a time-critical security patch.FWIW, I agree that (2) seems like a really bad option. In that suddenly a
committer has committed something they were not aware of.
Yeah, I dislike (2) a lot too.
Yeah, I'm definitely not a big fan of automated commits, regardless of if
it's just indent or both indent+typedef. It's happened at least once, and I
think more than once, that we've had to basically hard reset the upstream
repository and clean things up after automated commits have gone bonkers
(hi, Bruce!). Having an automated system do the whole flow of
change->commit->push definitely invites this type of problem.
Agreed, automated commits seems terribly risky.
There are many solutions that do such things but that do it in the "github
workflow" way, which is they do change -> commit -> create pull request,
and then somebody eyeballs that pullrequest and approves it. We don't have
PRs, but we could either have a script that simply sends out a patch to a
mailinglist, or we could have a script that maintains a separate branch
which is auto-pgindented, and then have a committer squash-merge that
branch after having reviewed it.Maybe something like that in combination with a pre-commit hook per above.
So, in our world, wouldn't this translate to 'make cfbot complain'?
I'm definitely a fan of the idea of having cfbot flag these and then we
maybe get to a point where it's not the committers dealing with fixing
patches that weren't pgindent'd properly, it's the actual patch
submitters being nagged about it...
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> writes:
So, in our world, wouldn't this translate to 'make cfbot complain'?
I'm definitely a fan of the idea of having cfbot flag these and then we
maybe get to a point where it's not the committers dealing with fixing
patches that weren't pgindent'd properly, it's the actual patch
submitters being nagged about it...
Meh. Asking all submitters to install pgindent is a bit of a burden.
Moreover, sometimes it's better to provide a patch that deliberately
hasn't reindented existing code, for ease of review (say, when you're
adding if() { ... } around some big hunk of code). I think getting
committers to do this as part of commit is a better workflow.
(Admittedly, since I've been doing that for a long time, I don't
find it to be a burden. I suppose some committers do.)
regards, tom lane
On Thu, Aug 13, 2020 at 6:30 PM Stephen Frost <sfrost@snowman.net> wrote:
* Magnus Hagander (magnus@hagander.net) wrote:
There are many solutions that do such things but that do it in the
"github
workflow" way, which is they do change -> commit -> create pull request,
and then somebody eyeballs that pullrequest and approves it. We don'thave
PRs, but we could either have a script that simply sends out a patch to a
mailinglist, or we could have a script that maintains a separate branch
which is auto-pgindented, and then have a committer squash-merge that
branch after having reviewed it.Maybe something like that in combination with a pre-commit hook per
above.
So, in our world, wouldn't this translate to 'make cfbot complain'?
I'm definitely a fan of the idea of having cfbot flag these and then we
maybe get to a point where it's not the committers dealing with fixing
patches that weren't pgindent'd properly, it's the actual patch
submitters being nagged about it...
Werll, that's one thing, but what I was thinking of here was more of an
automated branch maintained for the committers, not for the individual
patch owners.
That is:
1. Whenever a patch is pushed on master on the main repo a process kicked
off (or maybe wait 5 minutes to coalesce multiple patches if there are)
2. This process checks out master, and runs pgindent on it
3. When done, this gets committed to a new branch (or just overwrites an
existing branch of course, we don't need to maintain history here) like
"master-indented". This branch can be in a different repo, but one that
starts out as a clone of the main one
4. A committer (any committer) can then on regular basis examine the
differences by fetch + diff. If they're happy with it, cherry pick it in.
If not, figure out what needs to be done to adjust it.
Step 4 can be done at whatever interval we prefer, and we can have
something to nag us if head has been "off-indent" for too long.
This would be the backup for things that aren't indented during patch
commit, not other things.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On Wed, Aug 12, 2020 at 7:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm not in favor of unsupervised pgindent runs, really. It can do a lot
of damage to code that was written without thinking about it --- in
particular, it'll make a hash of comment blocks that were manually
formatted and not protected with dashes.
No committer should be committing code without thinking about
pgindent. If some are, they need to up their game.
I am not sure whether weekly or after-every-commit pgindent runs is a
good idea, but I think we should try to do it once a month or so. It's
too annoying otherwise. I could go either way on the question of
automation.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Aug 13, 2020 at 12:30 PM Stephen Frost <sfrost@snowman.net> wrote:
So, in our world, wouldn't this translate to 'make cfbot complain'?
This seems like it would be useful, but we'd have to figure out what
to do about typedefs.list. If the patch is indented with the current
one (which is auto-generated by the entire build farm, remember) it's
likely to mess up a patch that's otherwise properly formatted. We'd
either need to insist that people include updates to typedefs.list in
the patch, or else have the cfbot take a stab at doing those updates
itself.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
I am not sure whether weekly or after-every-commit pgindent runs is a
good idea, but I think we should try to do it once a month or so. It's
too annoying otherwise. I could go either way on the question of
automation.
The traditional reason for not doing pgindent too often has been that
it'd cause more work for people who have to rebase their patches over
pgindent's results. If we want to do it more often, then in order to
respond to that concern, I think we need to do it really often ---
not necessarily quite continuously, but often enough that pgindent
is only changing recently-committed code. In this way, it'd be likely
that anyone with a patch touching that same code would only need to
rebase once not twice. The approaches involving an automated run
give a guarantee of that, otherwise we don't have a guarantee; but
as long as it's not many days delay I think it wouldn't be bad.
Intervals on the order of a month seem likely to be the worst of
both worlds from this standpoint --- too long for people to wait
before rebasing their patch, yet short enough that they'd have
to do so repeatedly.
regards, tom lane