Issue 3702 in pharo: squeeze out a bit more speed from TextDiffBuilder

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue 3702 in pharo: squeeze out a bit more speed from TextDiffBuilder

pharo
Status: FixedWaitingToBePharoed
Owner: [hidden email]
Labels: Type-Squeak

New issue 3702 by [hidden email]: squeeze out a bit more speed from  
TextDiffBuilder
http://code.google.com/p/pharo/issues/detail?id=3702

I analysed the following changes and they are based on exactly the same  
method versions in pharo so they can be applied (without problems from  
their side)



Name: System-ul.419
Author: ul
Time: 8 February 2011, 3:36:03.896 am
UUID: 0ca6c6be-d524-b444-9c5e-c16ef8059669
Ancestors: System-ul.418

- squeeze out a bit more speed from TextDiffBuilder

=============== Diff against System-ul.418 ===============

Item was changed:
  ----- Method: TextDiffBuilder>>findMatches (in category 'private') -----
  findMatches
        "I find the matching pairs of xLines and yLines. First I filter out  
all lines that can't have a pair, then I find the longest common  
subsequence of the remaining elements. Finally I mark the matching pairs."

+       | temp lcs xFilteredLines yFilteredLines xNumbers yNumbers |
+       "Filter out all lines that can't have a pair."
+       temp := yLines asSet.
-       | lineSet lcs xFilteredLines yFilteredLines |
-       lineSet := yLines asSet.
        xFilteredLines := xLines select: [ :each |
+               temp includes: each ].
-               lineSet includes: each ].
        xFilteredLines size = 0 ifTrue: [ ^self ].
+       temp := xLines asSet.
-       lineSet := xLines asSet.
        yFilteredLines := yLines select: [ :each |
+               temp includes: each ].
-               (lineSet includes: each) ].
        yFilteredLines size = 0 ifTrue: [ ^self ].
+       "Map all lines to SmallIntegers, because they can be compared  
faster."
+       temp := Dictionary new.
+       xNumbers := xFilteredLines collect: [ :each |
+               temp at: each ifAbsentPut: [ temp size ] ].
+       yNumbers := yFilteredLines collect: [ :each |
+               temp at: each ifAbsentPut: [ temp size ] ].
+       temp := nil.
+       "Find the longest common subsequence."
+       lcs := self lcsFor: xNumbers and: yNumbers.
+       "Mark the matching pairs."
-       lcs := self
-               lcsFor: xFilteredLines
-               and: yFilteredLines.
        [ lcs == nil ] whileFalse: [
                (xFilteredLines at: (lcs at: 1)) matches: (yFilteredLines  
at: (lcs at: 2)).
                lcs := lcs at: 3 ]!

Item was changed:
  ----- Method: TextDiffBuilder>>lcsFor:and: (in category 'private') -----
  lcsFor: xFilteredLines and: yFilteredLines
-
        "I find one of the longest common subsequences of my the arguments.  
I assume that none of my arguments are empty. I return nil or an Array  
which represents a list. The first two elements are the matching line  
numbers, the last is the next node in the list or nil if there are no more  
elements. The list containts the longest common subsequence. I'm a modified  
version of the greedy lcs algorithm from the 6th page of 'An O(ND)  
Difference Algorithm and Its Variations (1986)' by Eugene W. Myers"

        | n m v lcss max |
        n := xFilteredLines size.
        m := yFilteredLines size.
        max := m + n.
        v := Array new: 2 * max + 1.
        v at: max + 2 put: 0.
        lcss := Array new: 2 * max + 1.
        0 to: max do: [ :d |
                d negated to: d by: 2 do: [ :k |
                        | index lcs x y |
+                       index := max + k.
+                       (k + d = 0 or: [ k ~= d and: [ (v at: index) < (v  
at: index + 2) ] ])
+                               ifTrue: [ x := v at: (index := index + 2) ]
+                               ifFalse: [ x := (v at: index) + 1 ].
-                       (k + d = 0 or: [ k ~= d and: [ (v at: max + k ) <  
(v at: max + k + 2) ] ])
-                               ifTrue: [
-                                       index := max + k + 2.
-                                       x := v at: index ]
-                               ifFalse: [
-                                       index := max + k.
-                                       x := (v at: index) + 1 ].
                        y := x - k.
                        lcs := lcss at: index.
                        [ x < n and: [ y < m and: [ (xFilteredLines at: x +  
1) = (yFilteredLines at: y + 1) ] ] ]
                                whileTrue: [
                                        lcs := { x := x + 1. y := y + 1. lcs  
} ].
+                       (x >= n and: [ y >= m ]) ifTrue: [ ^lcs ].
-                       (x >= n and: [ y >= m ]) ifTrue: [
-                               ^lcs ].
                        v at: max + k + 1 put: x.
                        lcss at: max + k + 1 put: lcs ] ].
        self error!


Reply | Threaded
Open this post in threaded view
|

Re: Issue 3702 in pharo: squeeze out a bit more speed from TextDiffBuilder

pharo
Updates:
        Labels: Difficulty-Easy

Comment #1 on issue 3702 by [hidden email]: squeeze out a bit more  
speed from TextDiffBuilder
http://code.google.com/p/pharo/issues/detail?id=3702

to create a cs

lcsFor: xFilteredLines and: yFilteredLines
        "I find one of the longest common subsequences of my the arguments. I  
assume that none of my arguments are empty. I return nil or an Array which  
represents a list. The first two elements are the matching line numbers,  
the last is the next node in the list or nil if there are no more elements.  
The list containts the longest common subsequence. I'm a modified version  
of the greedy lcs algorithm from the 6th page of 'An O(ND) Difference  
Algorithm and Its Variations (1986)' by Eugene W. Myers"

        | n m v lcss max |
        n := xFilteredLines size.
        m := yFilteredLines size.
        max := m + n.
        v := Array new: 2 * max + 1.
        v at: max + 2 put: 0.
        lcss := Array new: 2 * max + 1.
        0 to: max do: [ :d |
                d negated to: d by: 2 do: [ :k |
                        | index lcs x y |
                        index := max + k.
                        (k + d = 0 or: [ k ~= d and: [ (v at: index) < (v at: index + 2) ] ])
                                ifTrue: [ x := v at: (index := index + 2) ]
                                ifFalse: [ x := (v at: index) + 1 ].
                        y := x - k.
                        lcs := lcss at: index.
                        [ x < n and: [ y < m and: [ (xFilteredLines at: x + 1) = (yFilteredLines  
at: y + 1) ] ] ]
                                whileTrue: [
                                        lcs := { x := x + 1. y := y + 1. lcs } ].
                        (x >= n and: [ y >= m ]) ifTrue: [ ^lcs ].
                        v at: max + k + 1 put: x.
                        lcss at: max + k + 1 put: lcs ] ].
        self error

and

findMatches
        "I find the matching pairs of xLines and yLines. First I filter out all  
lines that can't have a pair, then I find the longest common subsequence of  
the remaining elements. Finally I mark the matching pairs."

        | temp lcs xFilteredLines yFilteredLines xNumbers yNumbers |
        "Filter out all lines that can't have a pair."
        temp := yLines asSet.
        xFilteredLines := xLines select: [ :each |
                temp includes: each ].
        xFilteredLines size = 0 ifTrue: [ ^self ].
        temp := xLines asSet.
        yFilteredLines := yLines select: [ :each |
                temp includes: each ].
        yFilteredLines size = 0 ifTrue: [ ^self ].
        "Map all lines to SmallIntegers, because they can be compared faster."
        temp := Dictionary new.
        xNumbers := xFilteredLines collect: [ :each |
                temp at: each ifAbsentPut: [ temp size ] ].
        yNumbers := yFilteredLines collect: [ :each |
                temp at: each ifAbsentPut: [ temp size ] ].
        temp := nil.
        "Find the longest common subsequence."
        lcs := self lcsFor: xNumbers and: yNumbers.
        "Mark the matching pairs."
        [ lcs == nil ] whileFalse: [
                (xFilteredLines at: (lcs at: 1)) matches: (yFilteredLines at: (lcs at:  
2)).
                lcs := lcs at: 3 ]