git diff --patience

Started by Kevin Grittnerover 15 years ago11 messageshackers
Jump to latest
#1Kevin Grittner
Kevin.Grittner@wicourts.gov

I just discovered the --patience flag on the git diff command, and
I'd like to suggest that we encourage people to use it when possible
for building patches. I just looked at output with and without it
(and for good measure, before and after filterdiff --format=context
for both), and the results were much better with this switch.

Here's a reference to the algorithm:

http://bramcohen.livejournal.com/73318.html

I think that page understates the benefits, though.

-Kevin

#2Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#1)
Re: git diff --patience

Kevin Grittner wrote:

I just discovered the --patience flag on the git diff command, and
I'd like to suggest that we encourage people to use it when possible
for building patches. I just looked at output with and without it
(and for good measure, before and after filterdiff --format=context
for both), and the results were much better with this switch.

Here's a reference to the algorithm:

http://bramcohen.livejournal.com/73318.html

I think that page understates the benefits, though.

I have seen the bracket example shown and the --patience output is
clearly nicer.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Kevin Grittner (#1)
Re: git diff --patience

On ons, 2010-09-15 at 12:58 -0500, Kevin Grittner wrote:

I just discovered the --patience flag on the git diff command, and
I'd like to suggest that we encourage people to use it when possible
for building patches. I just looked at output with and without it
(and for good measure, before and after filterdiff --format=context
for both), and the results were much better with this switch.

I have tried this switch various times now and haven't seen any
difference at all in the output. Do you have an existing commit where
you see a difference so I can try it and see if there is some other
problem that my local configuration has?

#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#3)
Re: git diff --patience

Peter Eisentraut <peter_e@gmx.net> wrote:

I have tried this switch various times now and haven't seen any
difference at all in the output. Do you have an existing commit
where you see a difference so I can try it and see if there is
some other problem that my local configuration has?

Having looked at it more, I find that the output with the switch is
usually the same as without; but when they differ, I always have
preferred the version with it on. Attached is the diff which caused
me to see if there was a way to make the diff output smarter, and
the result of adding the --patience flag.

This is the unified form that git puts out by default, but the
benefit is there after filterdiff --format=context, too.

-Kevin

Attachments:

patience-off.diffapplication/octet-stream; name=patience-off.diffDownload+3-126
patience-on.diffapplication/octet-stream; name=patience-on.diffDownload+5-128
#5Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Peter Eisentraut (#3)
Re: git diff --patience

Peter Eisentraut <peter_e@gmx.net> wrote:

Do you have an existing commit where you see a difference so I can
try it and see if there is some other problem that my local
configuration has?

Random poking around in the postgresql.git commits didn't turn up
any where it mattered, so here's before and after files for the
example diff files already posted. If you create branch, commit the
before copy, and copy in the after copy, you should be able to
replicate the results I posted.

-Kevin

Attachments:

predicate.h-beforeapplication/octet-stream; name=predicate.h-beforeDownload
predicate.h-afterapplication/octet-stream; name=predicate.h-afterDownload
#6Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Kevin Grittner (#5)
Re: git diff --patience

Attached are two versions of the same patch, with and without --patience.

The with-patience version has only two hunks, removal of a big block of
comment and addition of a big block of code.

The without-patience patience is riddled with the mix of two hunks, spread
until line 120.

--patience is a clear winner here.

Regards,

On Wed, Sep 29, 2010 at 5:10 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov

wrote:

Peter Eisentraut <peter_e@gmx.net> wrote:

Do you have an existing commit where you see a difference so I can
try it and see if there is some other problem that my local
configuration has?

Random poking around in the postgresql.git commits didn't turn up
any where it mattered, so here's before and after files for the
example diff files already posted. If you create branch, commit the
before copy, and copy in the after copy, you should be able to
replicate the results I posted.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Attachments:

diff_without_patience.patchapplication/octet-stream; name=diff_without_patience.patchDownload+141-23
diff_with_patience.patchapplication/octet-stream; name=diff_with_patience.patchDownload+147-29
#7Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Gurjeet Singh (#6)
Re: git diff --patience

Gurjeet Singh <singh.gurjeet@gmail.com> wrote:

The with-patience version has only two hunks, removal of a big
block of comment and addition of a big block of code.

The without-patience patience is riddled with the mix of two
hunks, spread until line 120.

--patience is a clear winner here.

When I read the description of the algorithm, I can't imagine a
situation where --patience would make the diff *worse*. I was
somewhat afraid (based on the name) that it would be slow; but
if it is slower, it hasn't been by enough for me to notice it.

-Kevin

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kevin Grittner (#7)
Re: git diff --patience

Excerpts from Kevin Grittner's message of jue sep 30 16:38:11 -0400 2010:

When I read the description of the algorithm, I can't imagine a
situation where --patience would make the diff *worse*. I was
somewhat afraid (based on the name) that it would be slow; but
if it is slower, it hasn't been by enough for me to notice it.

There is a very simple example posted on some of the blog posts that
goes something like

aaaaaaaa
aaaaaaaa
aaaaaaaa
bbbbbbbb
bbbbbbbb
bbbbbbbb
xyz

and the "xyz" is moved to the front. In this corner case, the patience
diff is a lot worse.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#9Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Alvaro Herrera (#8)
Re: git diff --patience

On Fri, Oct 1, 2010 at 9:38 AM, Alvaro Herrera
<alvherre@commandprompt.com>wrote:

Excerpts from Kevin Grittner's message of jue sep 30 16:38:11 -0400 2010:

When I read the description of the algorithm, I can't imagine a
situation where --patience would make the diff *worse*. I was
somewhat afraid (based on the name) that it would be slow; but
if it is slower, it hasn't been by enough for me to notice it.

There is a very simple example posted on some of the blog posts that
goes something like

aaaaaaaa
aaaaaaaa
aaaaaaaa
bbbbbbbb
bbbbbbbb
bbbbbbbb
xyz

and the "xyz" is moved to the front. In this corner case, the patience
diff is a lot worse.

Sorry, but that example didn't make much sense to me. Can you please
elaborate, or maybe share those blog posts you are referring to.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

#10Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Gurjeet Singh (#9)
Re: git diff --patience

Gurjeet Singh <singh.gurjeet@gmail.com> wrote:

Alvaro Herrera <alvherre@commandprompt.com>wrote:

There is a very simple example posted on some of the blog posts
that goes something like

aaaaaaaa
aaaaaaaa
aaaaaaaa
bbbbbbbb
bbbbbbbb
bbbbbbbb
xyz

and the "xyz" is moved to the front. In this corner case, the
patience diff is a lot worse.

Sorry, but that example didn't make much sense to me. Can you
please elaborate, or maybe share those blog posts you are referring

to.

I tried it out. Here are the results:

git diff --color
diff --git a/a1 b/a1
index bd0586b..32736d1 100644
--- a/a1
+++ b/a1
@@ -1,7 +1,7 @@
+xyz
 aaaaaaaa
 aaaaaaaa
 aaaaaaaa
 bbbbbbbb
 bbbbbbbb
 bbbbbbbb
-xyz
git diff --color --patience
diff --git a/a1 b/a1
index bd0586b..32736d1 100644
--- a/a1
+++ b/a1
@@ -1,7 +1,7 @@
-aaaaaaaa
-aaaaaaaa
-aaaaaaaa
-bbbbbbbb
-bbbbbbbb
-bbbbbbbb
 xyz
+aaaaaaaa
+aaaaaaaa
+aaaaaaaa
+bbbbbbbb
+bbbbbbbb
+bbbbbbbb

This is because lines which only occur once in a file are the
"anchors" around which lines which occur multiple times move --
after keeping intact any leading and trailing lines which match
between the files. An interesting exercise it so think about what
real-life lines you could have which would have multiple occurrences
in this pattern, and think about whether you would then prefer the
--patience output, especially if this were part of a larger file.
Even in this supposed "worst case" example, I'm not at all sure I
wouldn't prefer --patience, personally, even though more lines are
flagged.

-Kevin

#11Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#10)
Re: git diff --patience

On Fri, Oct 1, 2010 at 7:15 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

An interesting exercise it so think about what
real-life lines you could have which would have multiple occurrences
in this pattern, and think about whether you would then prefer the
--patience output, especially if this were part of a larger file.

The linux-kernel mailing list had examples of this occurring in real
life too. In real C programs function signatures usually end up being
the unique lines which is what you want but it can happen that
surprising lines are unique. Even braces can be unique if a given
indentation level is only used once.

The discussion basically convinced me that using uniqueness alone is a
bad idea but that the basic idea of trying to identify the important
lines is a fine idea. It's just that uniqueness turns out to be a
relatively weak signal for interesting lines. Linus suggested
line-length but it's pretty debatable which is better.

--
greg