Smalltalk › Squeak › Squeak - Dev

[ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

15 messages Options

Juan Vuletich-4

[ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

Hi Folks,

I've just updated Cuis to 2.1 at
http://www.jvuletich.org/Cuis/Index.html . From the release notes:

Cuis Release notes
Only the most relevant Cuis specific changes are detailed here. To see
all of them, browse the numbered changes themselves, that are part of
this release. Cuis official site is
http://www.jvuletich.org/Cuis/Index.html .

New in Cuis 2.1
- Support for the Unary numeral system, as suggested by Dan Ingalls at
http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/013368.html
- A new code differ that shows differences in words and not lines, by
Leandro Caniglia
- Closure measurements (based on work by Eliot Miranda) are shown in the
annotation pane for any method
- Removal of 43 isXXX methods, replaced by the general #is: method
- Misc. fixes and enhancements from Squeak and/or Pharo

New in Cuis 2.0
- Cuis is now closure-enabled. The closure examples xxxx work, and Cuis
is ready to run on the Cog VM (when available). Cuis 2.0 requires a
closures-enabled VM

I recommend everybody to see the new class comment in Number, and try
the new number format! The alternative code differ by Leando, based on
words and not lines is worth trying too. Finally, the Closure
measurements for every method are instructive and useful. Warning: I
fixed 2 bugs in Eliot's code, as I said in
http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-February/144120.html
.

Comments welcome.
Cheers,
Juan Vuletich

Levente Uzonyi-2

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

On Fri, 12 Feb 2010, Juan Vuletich wrote:

> Hi Folks,
>
> I've just updated Cuis to 2.1 at http://www.jvuletich.org/Cuis/Index.html .
> From the release notes:
>
> Cuis Release notes
> Only the most relevant Cuis specific changes are detailed here. To see all of
> them, browse the numbered changes themselves, that are part of this release.
> Cuis official site is http://www.jvuletich.org/Cuis/Index.html .
>
> New in Cuis 2.1
> - Support for the Unary numeral system, as suggested by Dan Ingalls at
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/013368.html
> - A new code differ that shows differences in words and not lines, by Leandro
> Caniglia
> - Closure measurements (based on work by Eliot Miranda) are shown in the
> annotation pane for any method
> - Removal of 43 isXXX methods, replaced by the general #is: method
> - Misc. fixes and enhancements from Squeak and/or Pharo
>
> New in Cuis 2.0
> - Cuis is now closure-enabled. The closure examples xxxx work, and Cuis is
> ready to run on the Cog VM (when available). Cuis 2.0 requires a
> closures-enabled VM
>
> I recommend everybody to see the new class comment in Number, and try the new
> number format! The alternative code differ by Leando, based on words and not
> lines is worth trying too. Finally, the Closure measurements for every method
> are instructive and useful. Warning: I fixed 2 bugs in Eliot's code, as I
> said in
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-February/144120.html
> .

I tried the new number format, it works, but Shout doesn't know about it
and 1r1e1 is parsed as 1r1 e1.

You can also get the word based diff using TextDiffBuilder by subclassing
it and overriding #split:. Actually ClassDiffBuilder does exactly this, so
using ClassDiffBuilder instead of TextDiffBuilder will give you the
expected result.

The Closure measurements are really cool, but I think the descriptions
are a bit too long for a single line annotation pane.

Levente

>
> Comments welcome.
> Cheers,
> Juan Vuletich
>
>

Leandro Caniglia

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

Levente Uzonyi wrote:

You can also get the word based diff using TextDiffBuilder by subclassing it and overriding #split:. Actually ClassDiffBuilder does exactly this, so using ClassDiffBuilder instead of TextDiffBuilder will give you the expected result.

I think that the difference with the implementation Valeria and I did is that ours switches from lines to words automatically. It also defines "words" as Smalltalk words, meaning that it understands keyword limits, which gives better results. The differentiator starts comparing lines, and if the proportion of line differences isn't too high, then it reutilizes the work already done (with lines) and refines it to get differences in between keywords. The refining strategy was crucial for achieving good performance.

Not sure if the full functionality I'm describing here is in the last Cuis version though.

/Leandro

Eliot Miranda-2

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

On Fri, Feb 12, 2010 at 12:53 PM, Leandro <[hidden email]> wrote:

Levente Uzonyi wrote:

You can also get the word based diff using TextDiffBuilder by subclassing it and overriding #split:. Actually ClassDiffBuilder does exactly this, so using ClassDiffBuilder instead of TextDiffBuilder will give you the expected result.

I think that the difference with the implementation Valeria and I did is that ours switches from lines to words automatically. It also defines "words" as Smalltalk words, meaning that it understands keyword limits, which gives better results. The differentiator starts comparing lines, and if the proportion of line differences isn't too high, then it reutilizes the work already done (with lines) and refines it to get differences in between keywords. The refining strategy was crucial for achieving good performance.

What does it do with inst var names within strings as they appear in class definitions, e.g. can they pick out the one work difference between these two?

StackInterpreter subclass: #CoInterpreter

instanceVariableNames: 'cogit cogMethodZone inFullGC cogCodeSize desiredCogCodeSize heapBase linkSends lastCoggableInterpretedBlockMethod reenterInterpreter deferSmash deferredSmash traceLog traceLogIndex traceSources cogCompiledCodeCompactionCalledFor statCodeCompactionCount statCodeCompactionUsecs lastUncoggableInterpretedBlockMethod processHasThreadId flagInterpretedMethods maxLiteralCountForCompile'

classVariableNames: 'CSCallbackEnter CSCallbackLeave CSCheckEvents CSEnterCriticalSection CSExitCriticalSection CSOwnVM CSResume CSSignal CSSuspend CSThreadSchedulingLoop CSWait CSYield HasBeenReturnedFromMCPC TraceBlockActivation TraceBlockCreation TraceBufferSize TraceCodeCompaction TraceContextSwitch TraceFullGC TraceIncrementalGC TraceIsFromInterpreter TraceIsFromMachineCode TraceSources'

poolDictionaries: 'VMStackFrameOffsets CogMethodConstants'

category: 'VMMaker-JIT'

StackInterpreter subclass: #CoInterpreter

instanceVariableNames: 'cogit cogMethodZone inFullGC cogCodeSize desiredCogCodeSize heapBase lastCoggableInterpretedBlockMethod reenterInterpreter deferSmash deferredSmash traceLog traceLogIndex traceSources cogCompiledCodeCompactionCalledFor statCodeCompactionCount statCodeCompactionUsecs lastUncoggableInterpretedBlockMethod processHasThreadId flagInterpretedMethods maxLiteralCountForCompile'

poolDictionaries: 'VMStackFrameOffsets CogMethodConstants'

category: 'VMMaker-JIT'

Not sure if the full functionality I'm describing here is in the last Cuis version though.

/Leandro

Leandro Caniglia

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

It produces a result which contains the class definition where the string ' linkSends' (note the space at: 1) is stricken out and all the rest is just regular text.

Here is the code:

base := <class definition one>.
base := <class definition two>.
diff := DifferenceFinder base: base case: case.
text := diff sourceCodeDifferences anyOne asText.

/Leandro

Eliot Miranda wrote:

On Fri, Feb 12, 2010 at 12:53 PM, Leandro <[hidden email]> wrote:

Levente Uzonyi wrote:

You can also get the word based diff using TextDiffBuilder by subclassing it and overriding #split:. Actually ClassDiffBuilder does exactly this, so using ClassDiffBuilder instead of TextDiffBuilder will give you the expected result.

I think that the difference with the implementation Valeria and I did is that ours switches from lines to words automatically. It also defines "words" as Smalltalk words, meaning that it understands keyword limits, which gives better results. The differentiator starts comparing lines, and if the proportion of line differences isn't too high, then it reutilizes the work already done (with lines) and refines it to get differences in between keywords. The refining strategy was crucial for achieving good performance.

What does it do with inst var names within strings as they appear in class definitions, e.g. can they pick out the one work difference between these two?

StackInterpreter subclass: #CoInterpreter

instanceVariableNames: 'cogit cogMethodZone inFullGC cogCodeSize desiredCogCodeSize heapBase linkSends lastCoggableInterpretedBlockMethod reenterInterpreter deferSmash deferredSmash traceLog traceLogIndex traceSources cogCompiledCodeCompactionCalledFor statCodeCompactionCount statCodeCompactionUsecs lastUncoggableInterpretedBlockMethod processHasThreadId flagInterpretedMethods maxLiteralCountForCompile'

classVariableNames: 'CSCallbackEnter CSCallbackLeave CSCheckEvents CSEnterCriticalSection CSExitCriticalSection CSOwnVM CSResume CSSignal CSSuspend CSThreadSchedulingLoop CSWait CSYield HasBeenReturnedFromMCPC TraceBlockActivation TraceBlockCreation TraceBufferSize TraceCodeCompaction TraceContextSwitch TraceFullGC TraceIncrementalGC TraceIsFromInterpreter TraceIsFromMachineCode TraceSources'

poolDictionaries: 'VMStackFrameOffsets CogMethodConstants'

category: 'VMMaker-JIT'

StackInterpreter subclass: #CoInterpreter

instanceVariableNames: 'cogit cogMethodZone inFullGC cogCodeSize desiredCogCodeSize heapBase lastCoggableInterpretedBlockMethod reenterInterpreter deferSmash deferredSmash traceLog traceLogIndex traceSources cogCompiledCodeCompactionCalledFor statCodeCompactionCount statCodeCompactionUsecs lastUncoggableInterpretedBlockMethod processHasThreadId flagInterpretedMethods maxLiteralCountForCompile'

classVariableNames: 'CSCallbackEnter CSCallbackLeave CSCheckEvents CSEnterCriticalSection CSExitCriticalSection CSOwnVM CSResume CSSignal CSSuspend CSThreadSchedulingLoop CSWait CSYield HasBeenReturnedFromMCPC TraceBlockActivation TraceBlockCreation TraceBufferSize TraceCodeCompaction TraceContextSwitch TraceFullGC TraceIncrementalGC TraceIsFromInterpreter TraceIsFromMachineCode TraceSources'

poolDictionaries: 'VMStackFrameOffsets CogMethodConstants'

category: 'VMMaker-JIT'

Not sure if the full functionality I'm describing here is in the last Cuis version though.

/Leandro
  

K. K. Subramaniam

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

In reply to this post by Juan Vuletich-4

On Friday 12 February 2010 08:53:36 pm Juan Vuletich wrote:
> - Support for the Unary numeral system, as suggested by Dan Ingalls at
> http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/013368.ht
> ml
1r1111 = 4.
Yay! I suppose this make Cuis the first Smalltalk to support tally counts. I
noticed you also fixed 1.0e+2 to return 100.0.

> - Closure measurements (based on work by Eliot Miranda) are shown in the
> annotation pane for any method
Just to keep noise levels down, you may want to drop "No real (non-optimized)
closures". When there is nothing to report, report nothing.

> - Removal of 43 isXXX methods, replaced by the general #is: method
Could this be generalized in Object? e.g.
is: aSymbol
^self class = (Smalltalk at: aSymbol) or: [ super is: aSymbol ]

isA: aClass
^self class = aClass or: [ super isA: aClass ].

Use of ALT+arrow to cycle through windows is a nice addition for keyboarders
like me. I would have preferred to use ALT+SPACE (or one of special character
keys) to do the same because arrow keys are placed differently on different
keyboards.

A nicely thought out update. Thanks .. Subbu

Nicolas Cellier

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

2010/2/13 K. K. Subramaniam <[hidden email]>:
> On Friday 12 February 2010 08:53:36 pm Juan Vuletich wrote:
>> - Support for the Unary numeral system, as suggested by Dan Ingalls at
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/013368.ht
>> ml
> 1r1111 = 4.
> Yay! I suppose this make Cuis the first Smalltalk to support tally counts. I
> noticed you also fixed 1.0e+2 to return 100.0.
>

which is IMO absolutely useless in the context of Smalltalk methods,
and notably insufficient in the context of more general application
reading numbers...
Unless you also handled common formats like (1.e+2 .1e+3 +100...) ?

Nicolas

>> - Closure measurements (based on work by Eliot Miranda) are shown in the
>> annotation pane for any method
> Just to keep noise levels down, you may want to drop "No real (non-optimized)
> closures". When there is nothing to report, report nothing.
>
>> - Removal of 43 isXXX methods, replaced by the general #is: method
> Could this be generalized in Object? e.g.
> is: aSymbol
> ^self class = (Smalltalk at: aSymbol) or: [ super is: aSymbol ]
>
> isA: aClass
> ^self class = aClass or: [ super isA: aClass ].
>
>
> Use of ALT+arrow to cycle through windows is a nice addition for keyboarders
> like me. I would have preferred to use ALT+SPACE (or one of special character
> keys) to do the same because arrow keys are placed differently on different
> keyboards.
>
> A nicely thought out update. Thanks .. Subbu
>
>

Juan Vuletich-4

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

In reply to this post by Leandro Caniglia

Leandro wrote:

>
> Levente Uzonyi wrote:
>>
>> You can also get the word based diff using TextDiffBuilder by
>> subclassing it and overriding #split:. Actually ClassDiffBuilder does
>> exactly this, so using ClassDiffBuilder instead of TextDiffBuilder
>> will give you the expected result.
>>
>
> I think that the difference with the implementation Valeria and I did
> is that ours switches from lines to words automatically. It also
> defines "words" as Smalltalk words, meaning that it understands
> keyword limits, which gives better results. The differentiator starts
> comparing lines, and if the proportion of line differences isn't too
> high, then it reutilizes the work already done (with lines) and
> refines it to get differences in between keywords. The refining
> strategy was crucial for achieving good performance.
>
> Not sure if the full functionality I'm describing here is in the last
> Cuis version though.
>
> /Leandro

Yes it is.

BTW, today I took a closer look at your code and learned it is a
superset of what TextDiffBuilder can do. So, I'm removing
TextDiffBuilder from Cuis right now.

Cheers,
Juan Vuletich

Juan Vuletich-4

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

In reply to this post by Levente Uzonyi-2

Levente Uzonyi wrote:

> On Fri, 12 Feb 2010, Juan Vuletich wrote:
>
>> Hi Folks,
>>
>> I've just updated Cuis to 2.1 at
>> http://www.jvuletich.org/Cuis/Index.html . From the release notes:
>>
>> Cuis Release notes
>> Only the most relevant Cuis specific changes are detailed here. To
>> see all of them, browse the numbered changes themselves, that are
>> part of this release. Cuis official site is
>> http://www.jvuletich.org/Cuis/Index.html .
>>
>> New in Cuis 2.1
>> - Support for the Unary numeral system, as suggested by Dan Ingalls
>> at
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/013368.html
>>
>> - A new code differ that shows differences in words and not lines, by
>> Leandro Caniglia
>> - Closure measurements (based on work by Eliot Miranda) are shown in
>> the annotation pane for any method
>> - Removal of 43 isXXX methods, replaced by the general #is: method
>> - Misc. fixes and enhancements from Squeak and/or Pharo
>>
>> New in Cuis 2.0
>> - Cuis is now closure-enabled. The closure examples xxxx work, and
>> Cuis is ready to run on the Cog VM (when available). Cuis 2.0
>> requires a closures-enabled VM
>>
>> I recommend everybody to see the new class comment in Number, and try
>> the new number format! The alternative code differ by Leando, based
>> on words and not lines is worth trying too. Finally, the Closure
>> measurements for every method are instructive and useful. Warning: I
>> fixed 2 bugs in Eliot's code, as I said in
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-February/144120.html
>> .
>
> I tried the new number format, it works, but Shout doesn't know about
> it and 1r1e1 is parsed as 1r1 e1.

Thanks, I forgot about Shout. Will do a new release soon, it seems!

> You can also get the word based diff using TextDiffBuilder by
> subclassing it and overriding #split:. Actually ClassDiffBuilder does
> exactly this, so using ClassDiffBuilder instead of TextDiffBuilder
> will give you the expected result.
>
> The Closure measurements are really cool, but I think the descriptions
> are a bit too long for a single line annotation pane.

Yes. I'll make the default height a bit larger. I'm always enlarging it now.

Cheers,
Juan Vuletich

>
> Levente

Juan Vuletich-4

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

In reply to this post by K. K. Subramaniam

K. K. Subramaniam wrote:

> On Friday 12 February 2010 08:53:36 pm Juan Vuletich wrote:
>
>> - Support for the Unary numeral system, as suggested by Dan Ingalls at
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/013368.ht
>> ml
>>
> 1r1111 = 4.
> Yay! I suppose this make Cuis the first Smalltalk to support tally counts. I
> noticed you also fixed 1.0e+2 to return 100.0.
>

I didn't :) . I just never loaded the updates that produce that behavior.

>> - Closure measurements (based on work by Eliot Miranda) are shown in the
>> annotation pane for any method
>>
> Just to keep noise levels down, you may want to drop "No real (non-optimized)
> closures". When there is nothing to report, report nothing.
>

Mhhh. Sounds reasonable. Will try it.

>> - Removal of 43 isXXX methods, replaced by the general #is: method
>>
> Could this be generalized in Object? e.g.
> is: aSymbol
> ^self class = (Smalltalk at: aSymbol) or: [ super is: aSymbol ]
>
> isA: aClass
> ^self class = aClass or: [ super isA: aClass ].
>

This, as you wrote it, won't work. Try it! Anyway, what you suggest is
similar to #isKindOf:. The difference is that #isXXX and #is: are not
tied to the hierarchy at all. I see them more as a test for a certain
protocol. That's why #isKindOf: is so bad. It is tied to the hierarchy.
See
http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-December/141694.html
.

> Use of ALT+arrow to cycle through windows is a nice addition for keyboarders
> like me. I would have preferred to use ALT+SPACE (or one of special character
> keys) to do the same because arrow keys are placed differently on different
> keyboards.
>

I see. Will take a look. Thanks for the suggestion.

> A nicely thought out update. Thanks .. Subbu

Thanks.
Cheers,
Juan Vuletich

Levente Uzonyi-2

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

In reply to this post by K. K. Subramaniam

On Sat, 13 Feb 2010, K. K. Subramaniam wrote:

> On Friday 12 February 2010 08:53:36 pm Juan Vuletich wrote:
>> - Support for the Unary numeral system, as suggested by Dan Ingalls at
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/2000-March/013368.ht
>> ml
> 1r1111 = 4.
> Yay! I suppose this make Cuis the first Smalltalk to support tally counts. I
> noticed you also fixed 1.0e+2 to return 100.0.
>
>> - Closure measurements (based on work by Eliot Miranda) are shown in the
>> annotation pane for any method
> Just to keep noise levels down, you may want to drop "No real (non-optimized)
> closures". When there is nothing to report, report nothing.

There are two cases when this is reported:
1. there are no blocks at all in the method
2. there are blocks, but all are inlined

The information is useful in the second case.

>
>> - Removal of 43 isXXX methods, replaced by the general #is: method
> Could this be generalized in Object? e.g.
> is: aSymbol
> ^self class = (Smalltalk at: aSymbol) or: [ super is: aSymbol ]
>
> isA: aClass
> ^self class = aClass or: [ super isA: aClass ].

That would be the same as #isKindOf:.

Levente

>
> Use of ALT+arrow to cycle through windows is a nice addition for keyboarders
> like me. I would have preferred to use ALT+SPACE (or one of special character
> keys) to do the same because arrow keys are placed differently on different
> keyboards.
>
> A nicely thought out update. Thanks .. Subbu
>
>

Levente Uzonyi-2

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

In reply to this post by Leandro Caniglia

On Fri, 12 Feb 2010, Leandro wrote:

>
> Levente Uzonyi wrote:
>
> You can also get the word based diff using TextDiffBuilder by subclassing it and overriding #split:. Actually ClassDiffBuilder does exactly
> this, so using ClassDiffBuilder instead of TextDiffBuilder will give you the expected result.
>
>
> I think that the difference with the implementation Valeria and I did is that ours switches from lines to words automatically. It also defines "words"
> as Smalltalk words, meaning that it understands keyword limits, which gives better results. The differentiator starts comparing lines, and if the
> proportion of line differences isn't too high, then it reutilizes the work already done (with lines) and refines it to get differences in between
> keywords. The refining strategy was crucial for achieving good performance.

If you subclass ClassDiffBuilder (let's call the subclass KeywordDiffBuilder)
and implement KeywordDiffBuilder >> #split: just like DifferenceFinder >>
#keywordsAndBlanksFrom: you get even better results.

DifferenceFinder has to use such tricks/hacks to have good performance,
because it uses the simplest lcs algorithm which requires n*m time and
space in all cases, while for practical cases TextDiffBuilder requires
O(n+m) time and space.

Levente

>
> Not sure if the full functionality I'm describing here is in the last Cuis version though.
>
> /Leandro
>
>

Leandro Caniglia

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

Levente,

Thank you for remarking this. Could you please suggest a reference where
I can learn more about the O(n+m) algorithm? (Note however that our
"trick" could also be useful with the fast algorithm because it avoids
comparing individual words of lines that are common).

/Leandro

Levente Uzonyi wrote:

> On Fri, 12 Feb 2010, Leandro wrote:
>
>>
>> Levente Uzonyi wrote:
>>
>> You can also get the word based diff using TextDiffBuilder by
>> subclassing it and overriding #split:. Actually ClassDiffBuilder does
>> exactly
>> this, so using ClassDiffBuilder instead of TextDiffBuilder will
>> give you the expected result.
>>
>>
>> I think that the difference with the implementation Valeria and I did
>> is that ours switches from lines to words automatically. It also
>> defines "words"
>> as Smalltalk words, meaning that it understands keyword limits, which
>> gives better results. The differentiator starts comparing lines, and
>> if the
>> proportion of line differences isn't too high, then it reutilizes the
>> work already done (with lines) and refines it to get differences in
>> between
>> keywords. The refining strategy was crucial for achieving good
>> performance.
>
> If you subclass ClassDiffBuilder (let's call the subclass
> KeywordDiffBuilder)
> and implement KeywordDiffBuilder >> #split: just like DifferenceFinder
> >> #keywordsAndBlanksFrom: you get even better results.
>
> DifferenceFinder has to use such tricks/hacks to have good
> performance, because it uses the simplest lcs algorithm which requires
> n*m time and space in all cases, while for practical cases
> TextDiffBuilder requires O(n+m) time and space.
>
>
> Levente
>
>>
>> Not sure if the full functionality I'm describing here is in the last
>> Cuis version though.
>>
>> /Leandro
>>
>>
>

Levente Uzonyi-2

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

On Sat, 13 Feb 2010, Leandro wrote:

> Levente,
>
> Thank you for remarking this. Could you please suggest a reference where I
> can learn more about the O(n+m) algorithm? (Note however that our "trick"
> could also be useful with the fast algorithm because it avoids comparing
> individual words of lines that are common).

TextDiffBuilder uses a modified version of the algorithm described here:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.6927
You can find our implementation in TextDiffBuilder >> #lcsFor:and:.
O(n+m) is only true for typical source code/text diffs, where the
number of differences between the two sequences is small. TextDiffBuilder
also uses an extra filtering which reduces n, m and the number of
differences by removing elements which occur in only one of the sequences
(this is common if the elements are limes).

Note that there are plenty of algorithms with various time/space
complexity for the lcs problem, even hierachical ones which may be better
for line+word diffs.

We chose this one, because it's widely used (current unix diff
implementations use the recursive version of this algorithm which has
guaranteed O(n+m) space complexity), relatively easy to implement and
works well with source code and text.

Levente

>
> /Leandro
>
> Levente Uzonyi wrote:
>> On Fri, 12 Feb 2010, Leandro wrote:
>>
>>>
>>> Levente Uzonyi wrote:
>>>
>>> You can also get the word based diff using TextDiffBuilder by
>>> subclassing it and overriding #split:. Actually ClassDiffBuilder does
>>> exactly
>>> this, so using ClassDiffBuilder instead of TextDiffBuilder will give
>>> you the expected result.
>>>
>>>
>>> I think that the difference with the implementation Valeria and I did is
>>> that ours switches from lines to words automatically. It also defines
>>> "words"
>>> as Smalltalk words, meaning that it understands keyword limits, which
>>> gives better results. The differentiator starts comparing lines, and if
>>> the
>>> proportion of line differences isn't too high, then it reutilizes the work
>>> already done (with lines) and refines it to get differences in between
>>> keywords. The refining strategy was crucial for achieving good
>>> performance.
>>
>> If you subclass ClassDiffBuilder (let's call the subclass
>> KeywordDiffBuilder)
>> and implement KeywordDiffBuilder >> #split: just like DifferenceFinder >>
>> #keywordsAndBlanksFrom: you get even better results.
>>
>> DifferenceFinder has to use such tricks/hacks to have good performance,
>> because it uses the simplest lcs algorithm which requires n*m time and
>> space in all cases, while for practical cases TextDiffBuilder requires
>> O(n+m) time and space.
>>
>>
>> Levente
>>
>>>
>>> Not sure if the full functionality I'm describing here is in the last Cuis
>>> version though.
>>>
>>> /Leandro
>>>
>>>
>>
>
>

Leandro Caniglia

Re: [ANN] [Cuis] Cuis 2.1 is available. Lots of cool stuff inside!

Thanks a lot.

/Leandro

Levente Uzonyi wrote:

On Sat, 13 Feb 2010, Leandro wrote:

Levente,

Thank you for remarking this. Could you please suggest a reference where I can learn more about the O(n+m) algorithm? (Note however that our "trick" could also be useful with the fast algorithm because it avoids comparing individual words of lines that are common).

TextDiffBuilder uses a modified version of the algorithm described here:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.6927
You can find our implementation in TextDiffBuilder >> #lcsFor:and:.
O(n+m) is only true for typical source code/text diffs, where the number of differences between the two sequences is small. TextDiffBuilder also uses an extra filtering which reduces n, m and the number of differences by removing elements which occur in only one of the sequences (this is common if the elements are limes).

Note that there are plenty of algorithms with various time/space complexity for the lcs problem, even hierachical ones which may be better for line+word diffs.

We chose this one, because it's widely used (current unix diff implementations use the recursive version of this algorithm which has guaranteed O(n+m) space complexity), relatively easy to implement and works well with source code and text.

Levente

/Leandro

Levente Uzonyi wrote:

On Fri, 12 Feb 2010, Leandro wrote:

Levente Uzonyi wrote:

You can also get the word based diff using TextDiffBuilder by subclassing it and overriding #split:. Actually ClassDiffBuilder does exactly
this, so using ClassDiffBuilder instead of TextDiffBuilder will give you the expected result.

I think that the difference with the implementation Valeria and I did is that ours switches from lines to words automatically. It also defines "words"
as Smalltalk words, meaning that it understands keyword limits, which gives better results. The differentiator starts comparing lines, and if the
proportion of line differences isn't too high, then it reutilizes the work already done (with lines) and refines it to get differences in between
keywords. The refining strategy was crucial for achieving good performance.

If you subclass ClassDiffBuilder (let's call the subclass KeywordDiffBuilder)
and implement KeywordDiffBuilder >> #split: just like DifferenceFinder >> #keywordsAndBlanksFrom: you get even better results.

DifferenceFinder has to use such tricks/hacks to have good performance, because it uses the simplest lcs algorithm which requires n*m time and space in all cases, while for practical cases TextDiffBuilder requires O(n+m) time and space.

Levente

Not sure if the full functionality I'm describing here is in the last Cuis version though.

/Leandro