New TextDiffBuilder implementation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

New TextDiffBuilder implementation

Levente Uzonyi-2
Hi,

We rewrote TextDiffBuilder using a modified version of the greedy longest
common subsequence algorithm from 'An O(ND) Difference Algorithm and Its
Variations (1986)' by Eugene W. Myers [1]. It's in The Inbox as:

System-klub.209. The implementation is smaller, faster, gives a minimal
patch and passes the tests.

                         Old        New      Ratio
Number of classes       5          4        0.8
Number of methods       50         29       0.58
Lines of code           387        197      0.51

We would appreciate if someone can review this before moving into trunk.

Cheers,
Levente & Balazs

[1] http://xmailserver.org/diff2.pdf

Reply | Threaded
Open this post in threaded view
|

Re: New TextDiffBuilder implementation

Andreas.Raab
Levente Uzonyi wrote:

> Hi,
>
> We rewrote TextDiffBuilder using a modified version of the greedy
> longest common subsequence algorithm from 'An O(ND) Difference Algorithm
> and Its Variations (1986)' by Eugene W. Myers [1]. It's in The Inbox as:
>
> System-klub.209. The implementation is smaller, faster, gives a minimal
> patch and passes the tests.
>
>                         Old        New      Ratio
> Number of classes       5          4        0.8
> Number of methods       50         29       0.58
> Lines of code           387        197      0.51
>
> We would appreciate if someone can review this before moving into trunk.

Looks good. There seem to be a few matches that are shown differently
(tried with various MC diffs) but that's to be expected since it's a
different algorithm. Ship it :-)

Cheers,
   - Andreas


Reply | Threaded
Open this post in threaded view
|

Re: Re: New TextDiffBuilder implementation

Levente Uzonyi-2
On Tue, 29 Dec 2009, Andreas Raab wrote:

> Levente Uzonyi wrote:
>> Hi,
>>
>> We rewrote TextDiffBuilder using a modified version of the greedy longest
>> common subsequence algorithm from 'An O(ND) Difference Algorithm and Its
>> Variations (1986)' by Eugene W. Myers [1]. It's in The Inbox as:
>>
>> System-klub.209. The implementation is smaller, faster, gives a minimal
>> patch and passes the tests.
>>
>>                         Old        New      Ratio
>> Number of classes       5          4        0.8
>> Number of methods       50         29       0.58
>> Lines of code           387        197      0.51
>>
>> We would appreciate if someone can review this before moving into trunk.
>
> Looks good. There seem to be a few matches that are shown differently (tried
> with various MC diffs) but that's to be expected since it's a different
> algorithm. Ship it :-)

Done. The main reason for the rewrite was that the old version didn't
create minimal patches, even for really simple cases (see the failing
tests). First I tried to fix it, but when I saw how bloated the
implementation was for such a simple thing, I thought a rewrite would be
better. Actually I couldn't find out how the old version worked.

Another reason for the differences in the diffs is that we accidentally
swapped the order of insertions and removals when they occured after each
other. This made only a "visual" difference. With the latest version, this
is fixed too.


Levente

>
> Cheers,
>  - Andreas
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: New TextDiffBuilder implementation

Tony Garnock-Jones-2
In reply to this post by Levente Uzonyi-2
Hi,

You might also be interested in http://www.squeaksource.com/DiffMerge.html.

Regards,
  Tony


Levente Uzonyi wrote:

> Hi,
>
> We rewrote TextDiffBuilder using a modified version of the greedy
> longest common subsequence algorithm from 'An O(ND) Difference Algorithm
> and Its Variations (1986)' by Eugene W. Myers [1]. It's in The Inbox as:
>
> System-klub.209. The implementation is smaller, faster, gives a minimal
> patch and passes the tests.
>
>                         Old        New      Ratio
> Number of classes       5          4        0.8
> Number of methods       50         29       0.58
> Lines of code           387        197      0.51
>
> We would appreciate if someone can review this before moving into trunk.
>
> Cheers,
> Levente & Balazs
>
> [1] http://xmailserver.org/diff2.pdf
>