Hi Eliot...
> Try VisualWorks Smalltalk or VisualAge Smalltalk. These are dynmically > compiled and hence have much higher Smalltalk compute performance than > Dolphin. I think you'd find that for symbolic computation VisualWorks > was equivalent to Java in speed... Thank you for the info. I have tried VisualWorks a bit and it is indeed much faster. Unfortunately, as well as VA, it is quite expensive when used commercially. Java OTOH is for free in development and deployment, and times are hard currently. I certainly would like to propose some ST dialect for the next project, but the step is too big, given the difference in price and that all the developers I work with are knowing Java well, but only one (me) knows some smalltalk. Anyway, Dolphin has some chances in our environment, e.g. as a platform for fat clients and maintenance apps, (the former depending on the progress in connectivity add-ons). I am prepared and will take any chances :-) Ciao ...Jochen |
In reply to this post by Blair McGlashan
Dear Blair et al,
> ... A significant point > about the Refactoring Engine, is that it is (like everything in the IDE > really) extensible by the user. You an add your own custom refactorings if > you wish, and indeed some people have: > > http://wiki.cs.uiuc.edu/CampSmalltalk/Custom+Refactorings+and+Rewrite+Editor > +Usability This thread seems like a convenient place to remark the following. 1) If a Dolphin Smalltalker were to attend the next Camp Smalltalk (this June in Gronau, Germany), they could help us port our work to Dolphin and would learn the innards of the RB while doing so. 2) I've noted the suggested new refactorings in this thread; they will be added to the list we review to decide what to do at the next CS. If you have ideas for refactorings the RB needs, or have more to say on the ones already suggested, you can add comments, or links to your pages, to our comments page (reachable from the above) or other pages as appropriate. > ... There is no UI > onto this in Dolphin 5 (there will be in the next release), but the original > Refactoring Browser has one called the Rewrite Tool John in VW7 appears to have deprecated the free-standing Rewrite tool in favour of a rewrite code tool for the RB. We have been drawn to the same approach in our project's VA load, I conjecture for the same reasons: - a common approach to selecting the environment to browse and the environment to rewrite is cleaner - custom searches and rewrites naturally interact with invoking RB features; it helps when each can take the result of the other as its point of departure Thus you may want to build (already be building) a RewriteCodeTool rather than a RewriteRuleEditor in your next release (and if so, you may also want to consider using our project's approach to building it: RewriteMetaCodeTool and subclasses for its panes). > > Will Loew-Blosser had an experience report at > OOPSLA that showed how useful they can be http://csc.noctrl.edu/f/opdyke/OOPSLA2002/Papers/TransformDataLayer.pdf Some examples (not requiring UI, so I assume imitable in Dolphin today) are on our pages and in our downloads. More will appear as time goes by. Yours faithfully Niall Ross, eXtremeMetaProgrammers ---- (My newsgroup posting address has a spam-trap, which you must remove if replying to me, not just the newsgroup.) ---- |
In reply to this post by mm_aa
mm_aa wrote:
> Steve, > >> Not sure what you mean. Are you adding tools using TTF_IDISHWND or >> are you using a uID to keep track of the tools? > > I just want to move the tooltip support to move in _whole_ to the > dedicated class (honestly saying, I don't like the smalltalk way to > have miriads of methods in one class which do everything). That's why > I need to dispatch TTN_NEEDTEXTA not inside the view, but inside the > Tooltip class, which will route the necessary commands to the > presenter. While you could probably do this, I think you would be fighting an uphill battle. You could experiment with View>>dispatchMessage:wParam:lParam, but I would definitely make sure your image was saved. Blair or Andy may have a better way of achieving this, but I dont know of any. To experiment, I moved your class to be a subclass of ControlView and added the class method; winClassName ^'tooltips_class32' I changed the name of #createWindow to #basicCreateAt:extent: and removed the interactor protocol. I created a Shell with a single PushButton named: 'myButton', and used the following workspace: "Create the Shell and the Tooltip view" myShell := MMAATooltipShell show. myTooltip := Tooltip new create. myTooltip setWindowPosition "Install the Shell's child view as a tool" myTooltip install: (myShell view viewNamed: 'myButton') "clean up" myTooltip destroy To get it to work, I enabled the pushButton's command in myShell, and added a #onTipTextRequired: to myShell. I know this is not what you are after, but it is the approach I would take. Depending on what kind of Views you want to add as tools, you may need to do some work intercepting the #wmNotify:wParam:lParam, like the Toolbar class, and my modifications to PushButton. Hope this helps, Steve -- Steve Waring Email: [hidden email] Journal: http://www.stevewaring.net/blog/home/index.html |
In reply to this post by Chris Uppal-3
"Chris Uppal" <[hidden email]> wrote in message
news:3e56075b$0$9695$[hidden email]... > Blair, > > > !MethodBrowser methodsFor! > > > > widenSourceSelection > > | node | > > node := self selectedNode. > > node isNil > > ifTrue: > > ["Normally we'd just disable the command, but to avoid patch to > > #queryCommand: ..." > > This is nice. > > The code itself seems to expose a bug in the reformatter code, though. > ctrl+shift+s left me with two copies of the comment, one in the original place, > the other after the surrounding block. It appears that some methods got lost between VW and Dolphin. If you add these two methods, I believe it will fix your problem: RBBlockNode>>statementComments ^self comments RBCascadeNode>>statementComments | statementComments | statementComments := OrderedCollection withAll: self comments. statementComments addAll: messages first receiver statementComments. messages do: [:each | each arguments do: [:arg | statementComments addAll: arg statementComments]]. ^statementComments asSortedCollection: [:a :b | a first < b first] John Brant |
John Brant wrote:
> It appears that some methods got lost between VW and Dolphin. If you > add these two methods, I believe it will fix your problem: That seems to work (though the class names begin with St, rather than RB, in the Dolphin context). Ta. -- chris |
In reply to this post by Eliot Miranda
"Eliot Miranda" <[hidden email]> wrote in message
news:[hidden email]... > > > Jochen Riekhof wrote: > > > > My current main use of Dolphin is to prototype all sorts of algorithms e.g. > > image processing and numerics. The image-based and interpreted apoproach of > > Smalltalk is ideal for this. My experience so far is that the average speed > > of execution of the algorithms is about twenty times faster when ported to > > java (No, I do NOT optimize the Java code and write dumb ST code, but rather > > profile the ST code with Ians great Profiler and usually do no more > > optimizing on Java side). > > > > The difference is that in Dolphin code is interpreted, while Java compiles > > down to native code. > > Try VisualWorks Smalltalk or VisualAge Smalltalk. These are dynmically > compiled and hence have much higher Smalltalk compute performance than > Dolphin. Jochen is referring primarily to numeric processing, and ViualWorks performance on that is not "much higher", in fact it is barely higher at all. Dolphin's fundamental numeric primitives (especially LargeInteger and Floating Point) are much faster than in VW (I don't know about VA), and judging from micro-benchmarks this seems to make up for much of the speed difference in basic computational performance. Since Java is a hybrid language with native value types for numerics, it is much easier to achieve near C speeds for numerics. Maybe with the work you are doing on adaptive inlining we will see that kind of capability in Smalltalk (I look forward to it), but in the meantime if one really wants to do high-performance numeric computation in Smalltalk, then the only realistic option would appear to be Smalltalk MT. >...I think you'd find that for symbolic computation VisualWorks > was equivalent to Java in speed... As fast as Hotspot? Can you prove back up that assertion? Regards Blair |
In reply to this post by Chris Uppal-3
> The code itself seems to expose a bug in the reformatter code, though.
Doing a > ctrl+shift+s left me with two copies of the comment, one in the original place, > the other after the surrounding block. Hum, I do not experience this! How can this be? Ciao ...Jochen |
Jochen Riekhof wrote:
> Hum, I do not experience this! How can this be? Are your formatter settings the same as the ones I posted ? -- chris |
In reply to this post by Blair McGlashan
I am not shure, but my belief is that the overhead comes from a lot of
method calls necessary to access collection elements. Arrays in Java are native (well, they inherit from Object, but are nevertheless treated "specially" by the VM), so these are extremly fats. I use ArrayList most of the time, however, but the ArrayList accessors (comparable to OrderedCollection) are probably inlined quickly by the HotSpot VM as they are called a lot. At least they are compiled almost instantly. BTW: I just tried on some code I currently have at hand to switch off compilation, thereby using interpretation only like in Dolphin. Times are about 10 times slower compared to hotspot execution. Unfortunately I did not prototype this particular one in Dolphin, so I can't compare directly, but it is likely that the differences for interpreted only code are only factor two between dolphin and java. As this particular sample uses primitive arrays a lot that ST is missing it is probably only a slim if any difference in code without extensive array usage. However, in current CPUs compilation apparently allows for huge improvements. Ciao ...Jochen |
Jochen Riekhof wrote:
> I am not shure, but my belief is that the overhead comes from a lot of > method calls necessary to access collection elements. > Arrays in Java are native (well, they inherit from Object, but are > nevertheless treated "specially" by the VM), so these are extremly > fats. I use ArrayList most of the time, however, but the ArrayList > accessors (comparable to OrderedCollection) are probably inlined > quickly by the HotSpot VM as they are called a lot. At least they are > compiled almost instantly. That doesn't sound right. If you are doing numerical and/or image processing work then you'll be using Java's primitive types. Anything else would be suicide for performance. But you can't put primitive types into an ArrayList. So I suspect that the inner loops of your code (the only bits that matter) are all using Java native arrays holding primitive types. If that's the case then I'd expect a difference of less than an order of magnitude between JITed Java and Dolphin for *integer* arithmetic and integer arrays. The difference is huge for floating point, though. (Presumably because of Smalltalk's "boxed" floats.) If you are seeing a 20-to-1 difference then I'd guess that nearly all of it is down to floating-point performance. BTW, as interpreters go, Dolphin is fast. It should beat a JVM running in interpretted mode easily for almost everything. The only exception would be floating point arithmetic, where (because of the boxing again) I'd expect Java to be about twice as fast. -- chris P.S. Mind you, the last time I actually *measured* any of this stuff was back in the days of D3 and JDK1.3 -- and on a now-obsolete machine... |
> That doesn't sound right. If you are doing numerical and/or image
processing > work then you'll be using Java's primitive types. I never said I use primitive types, it was Blairs guess :-). As I said, I am iterating a lot over array lists with iterators, and indeed do most calculations on floats/doubles that are part of the contents. > If that's the case then > I'd expect a difference of less than an order of magnitude between JITed Java > and Dolphin for *integer* arithmetic and integer arrays. The difference is > huge for floating point, though. (Presumably because of Smalltalk's "boxed" > floats.) If you are seeing a 20-to-1 difference then I'd guess that nearly all > of it is down to floating-point performance. This may well be. Ciao ...Jochen |
> > If that's the case then I'd expect a difference of less than
> > an order of magnitude between JITed Java and Dolphin > > for *integer* arithmetic and integer arrays. The difference > > is huge for floating point, though. (Presumably because of Smalltalk's > > "boxed" floats.) If you are seeing a 20-to-1 difference then I'd guess > > that nearly all of it is down to floating-point performance. > > This may well be. No, may not be :-). I did a VERY quick check on simple operations performance and here is the result. I measured both server and client vm in hotspot and interpreted mode vs. Dolphin. All code is appended. That the sever VM seems to be slow is that it does not have enough time to "wwarm up". It does a lot of background analysis and compilation that never pays off because the app runs only a few seconds at all. You can expect the server vm to be faster than the client vm after a few minutes. There is not a BIG difference in interpreted mode vs. Dolphin, except iterators are half the speed of a do: operation. Hotspot any version is much faster, however. Float is only about factor two slower in dolphin as opposed to your 1 to 20 guess. The most striking difference came from the memory management. Dolphin needed more than 30 seconds to allocated the one million Rectangle objects. After close of the workspace the env. freezed foar about 45 seconds for gc (I guess). Java vm forced full gc reported [Full GC 28108K->741K(51468K), 0.0321105 secs]. Meaning: used mem went down from 28108K to 741K, total heap minus one surrender space (copy target for short term copy gc, typically small) needed 0.0321105 secs. However. When evaluating the workspace again (overwriting the oc variable) the memory for the Rectangles where apparently reused very effectively. It should also be noted, that the Java VM never gives back any memory to the OS. Blair, does Dolphin do this? Also, as opposed to the Java example (apparently not executable from a workspace ;-) I used a workspace for the Dolphin test. If this is slower than in a class, please tell me. My conclusion is now, that the reason for my 1 to 20 ratio is probably mainly alloc and gc activity. Ciao ...Jochen P.S. you can enable server VM with the -server commandline flag, -Xint forces interpreted mode, and finally -verbose:gc prints garbage collection information on the console. --- The numbers ------------------------------------ Dolphin time needed alloc (first time) = 36219 !! time needed alloc = 1874 time needed get (index) = 481 time needed get (iterator) = 383 time needed double mul = 131 time needed (gc): about 45000 !! java server vm hotspot time needed alloc = 1438 time needed get (index) = 47 time needed get (iterator) = 63 time needed double mul = 31 java client vm hotspot time needed alloc = 984 time needed get (index) = 63 time needed get (iterator) = 109 time needed double mul = 16 java server vm interpreted time needed alloc = 2015 time needed get (index) = 407 time needed get (iterator) = 922 time needed double mul = 78 java client vm interpreted time needed alloc = 1984 time needed get (index) = 391 time needed get (iterator) = 859 time needed double mul = 47 ---Dolphin code (workspace)---------------------- Time millisecondsToRun: [ oc := OrderedCollection new. 1 to: 1000000 do: [:each | oc add: Rectangle new]]. Time millisecondsToRun: [1 to: 1000000 do: [:each | (oc at: each) top]]. Time millisecondsToRun: [oc do: [:each | each top]]. Time millisecondsToRun: [s := 1.00000001. 1 to: 1000000 do: [:each | s := s * 1.00000001.]]. ---Java code-------------------------------------- public static void main(String[] args) throws Exception { ArrayList al = new ArrayList(); long t = System.currentTimeMillis(); for (int i = 0; i < 1000000; i++) al.add(new Rectangle()); System.out.println("time needed alloc = " + (System.currentTimeMillis() -t)); t = System.currentTimeMillis(); for (int i = 0; i < al.size(); i++) ((Rectangle) al.get(i)).getWidth(); System.out.println("time needed get (index) = " + (System.currentTimeMillis() -t)); t = System.currentTimeMillis(); for (Iterator iter = al.iterator(); iter.hasNext(); ) ((Rectangle) iter.next()).getWidth(); System.out.println("time needed get (iterator) = " + (System.currentTimeMillis() -t)); t = System.currentTimeMillis(); double s = 1.00000001; for (int i = 0; i < 1000000; i++) s = s * 1.00000001; System.out.println("time needed double mul = " + (System.currentTimeMillis() -t)); System.gc(); Thread.sleep(1000); //wait for gc to complete } |
add-on measurements for VW 7 nc:
> time needed alloc = 6685 > time needed get (index) = 62 > time needed get (iterator) = 38 > time needed double mul = 145 time needed (gc): Global garbage collection (please wait)... reclaimed 27.86 Mbytes of data and 0 OTEntries in 0.2 sec. heap shrunk by 21.99 Mbytes 18.79 Mbytes total; 11.37 Mbytes used, 7.42 Mbytes free. There were no differences between allocs in VW. The float operations are indeed the same speed as dolphin. The other operations are comparable to hot spot vm, even somewhat faster (as Niall Ross pointed out). Ciao ...Jochen |
In reply to this post by John Brant
"John Brant" <[hidden email]> wrote in message
news:MB76a.209924$iG3.24082@sccrnsc02... > ... > It appears that some methods got lost between VW and Dolphin. If you add > these two methods, I believe it will fix your problem: > ... Thanks John (and Chris for reporting it). Regards Blair |
In reply to this post by NiallRoss
Niall
You wrote in message news:[hidden email]... > > > ... A significant point > > about the Refactoring Engine, is that it is (like everything in the IDE > > really) extensible by the user. You an add your own custom refactorings if > > you wish, and indeed some people have: > > > > > http://wiki.cs.uiuc.edu/CampSmalltalk/Custom+Refactorings+and+Rewrite+Editor > > +Usability > > This thread seems like a convenient place to remark the following. > > 1) If a Dolphin Smalltalker were to attend the next Camp Smalltalk (this > June in Gronau, Germany), they could help us port our work to Dolphin and > would learn the innards of the RB while doing so. I'd imagine that if the code were available in chunk format, rather than only Envy .dat files (is that right?), then it could be ported over before then, allowing work on some new refactorings at CS6 :-). Actually I'd really like to have the 'Rename Variable and Accessors' refactoring, so I would port that over myself. > 2) I've noted the suggested new refactorings in this thread; they will be > added to the list we review to decide what to do at the next CS. If you > have ideas for refactorings the RB needs, or have more to say on the ones > already suggested, you can add comments, or links to your pages, to our > comments page (reachable from the above) or other pages as appropriate. I'll do that, but some I'd like to see are: 1) Extract a constant to a class variable. This would add a class variable, introduce or modify a class initialize method to assign the constant to the variable, and then replace all references to the constant with the class variable. 2) Convert a boolean instance variable to a flag in a shared flags instance variable. Needs to introduce and initialize a class variable for the mask, and then create/modify accessors to do the necessary masking. 3) "Extract with holes" (as Don called it when I described it to him). This is a version of Extract Method that takes, in addition to the overall source interval to extract, a collection of intervals to exclude. The idea is to be able to extract a method leaving behind some of the parameter expressions. At the moment I have to do this by first extracting to temporaries all the parameter expressions I want to retain in the source method, then doing the extract method, and then inlining the temps again. I think a reasonable UI onto this could be created by using a subsidiary dialog to build up the list of excluded areas, since most text editors don't support selection of multiple disjoint ranges (unfortunately). > > > ... There is no UI > > onto this in Dolphin 5 (there will be in the next release), but the > original > > Refactoring Browser has one called the Rewrite Tool > > John in VW7 appears to have deprecated the free-standing Rewrite tool in > favour of a rewrite code tool for the RB. We have been drawn to the same > approach in our project's VA load, I conjecture for the same reasons: > > - a common approach to selecting the environment to browse and the > environment to rewrite is cleaner > > - custom searches and rewrites naturally interact with invoking RB > features; it helps when each can take the result of the other as its > of departure > > Thus you may want to build (already be building) a RewriteCodeTool rather > than a RewriteRuleEditor in your next release (and if so, you may also want > to consider using our project's approach to building it: > RewriteMetaCodeTool and subclasses for its panes). We don't use the RB as such, but have instead taking the approach of integrating the refactoring support into our native browsers (and Debugger). Dolphin's CodeMentor (SmallLint) and CodeRewriter are browser plugins. These work against an environment created based on the current selection in the browsers. >... Regards Blair |
In reply to this post by Jochen Riekhof
Jochen Riekhof wrote:
> time needed alloc (first time) = 36219 !! > time needed alloc = 1874 I see the same effect. I think there's something very screwy going on. Either a bug or a most unfortunate interaction with the OS. Blair, the following was all done on a Win2K laptop with 256 Mbytes. No paging activity at any time. The first oddity is plotting allocation speed against number allocated. It follows a very odd pattern: Up to about 560K Rectangles allocated, Dolphin's taking around 40msec to allocate 100K rects. The number grows slowly (presumably O(n^2) with low constants, but it's too irregular to tell). From 570K to 590K the rate decreases sharply to about 10sec/100K. (I know that sounds like thrashing, but it wasn't the hard disk -- my laptop has a *very* noisy disk, so I'm sure of it.) From 600K to the million mark, the rate decreases slowly and linearly (I plotted the histogram and it looks very linear) from 10sec/100K to 13sec/100K. In all, on my machine, it takes nearly 8 minutes to allocate 1 million Rectangles the first time. Freeing them and then re-runing the loop takes just 4 seconds. The second, and stranger, oddity is this: restart Dolphin, and then execute: size := 1000000. oc := OrderedCollection new: size. 1 to: size do: [:i | i = 685902 ifTrue: [self halt]. oc add: Rectangle new]. which halts after about 1/4 of the expected execution time. When the breakpoint hits, go into the debugger. spend a couple of seconds looking around, then resume. It isn't perfectly reproducible but usually the loop will then complete in almost no time. The 685902 number has no special magic about it except that it does need to be up around 700K. I'm not sure, but I get the impression that just resuming from the breakpoint prompt, or resuming from the debugger very quickly, fails to show the odd effect. I did wonder if bringing the debugger up was causing Dolphin to allocate a new chunk of memory that it could then recycle for the last 300K Rectangles, but, according to task manager, the memory footprint didn't increase until *after* I'd dismissed the debugger. Puzzles me... -- chris |
In reply to this post by Jochen Riekhof
"Jochen Riekhof" <[hidden email]> wrote in message
news:[hidden email]... > > > If that's the case then I'd expect a difference of less than > > > an order of magnitude between JITed Java and Dolphin > > > for *integer* arithmetic and integer arrays. The difference > > > is huge for floating point, though. (Presumably because of Smalltalk's > > > "boxed" floats.) If you are seeing a 20-to-1 difference then I'd guess > > > that nearly all of it is down to floating-point performance. > > > > This may well be. > > No, may not be :-). > > I did a VERY quick check on simple operations performance and here is the > result. I measured both server and client vm in hotspot and interpreted mode > vs. Dolphin. All code is appended. That the sever VM seems to be slow is > that it does not have enough time to "wwarm up". It does a lot of background > analysis and compilation that never pays off because the app runs only a few > seconds at all. You can expect the server vm to be faster than the client vm > after a few minutes. > > There is not a BIG difference in interpreted mode vs. Dolphin, except > iterators are half the speed of a do: operation. > Hotspot any version is much faster, however. Float is only about factor two > slower in dolphin as opposed to your 1 to 20 guess. > > The most striking difference came from the memory management. Dolphin needed > more than 30 seconds to allocated the one million Rectangle objects. After > close of the workspace the env. freezed foar about 45 seconds for gc (I > guess). >... I was pretty surprised by this, so I thought I'd look to see why. Just looking at the script, something that was immediately apparent is that your allocation test is actually allocating 3 million objects on Dolphin, vs 1 million on Java. This is because Smalltalk Rectangles are actually implemented as a pair of Point objects, whereas Java's is a single block of memory holding 4 integer values. Since this is a micro-benchmark designed to measure object allocation speed, I think it really ought to try and measure the same number of allocations. Note though that on VW, Rectangle class>>new answers an uninitialized Rectangle, so it is only performing 1 million allocations, at least if we ignore the allocations needed to grow the OrderedCollection. I noticed this when trying to run your benchmark on VW, as it failed on the second expression when attempting to access #top of the first Rectangle. Another point to note is that this isn't a particularly pure test of allocation speed, as Smalltalk has to send a few messages to initialize a Rectangle. Anyway, regardless of this, I tried out the following slight modification of your script on the 2.2Ghz P4 Xeon with 512Mb I happened to be using: start := Time millisecondClockValue. Transcript display: 'Alloc time: '; print: (Time millisecondsToRun: [ oc := OrderedCollection new. "Use #origin:corner: so will also run on VW and - note this actually allocates 3 million objects" 1 to: 1000000 do: [:each | oc add: (Rectangle origin: 0@0 corner: 0@0)]]); cr. Transcript display: 'Get (index) time: '; print: (Time millisecondsToRun: [1 to: 1000000 do: [:each | (oc at: each) top]]); cr. Transcript display: 'Iterate (do) time: '; print: (Time millisecondsToRun: [oc do: [:each | each top]]); cr. Transcript display: 'Double mul time: '; print: (Time millisecondsToRun: [s := 1.00000001. 1 to: 1000000 do: [:each | s := s * 1.00000001.]]); cr. Transcript display: 'GC time: '; print: (Time millisecondsToRun: [oc := s := nil. MemoryManager current collectGarbage "or ObjectMemory quickGC on VW"]); cr. Transcript display: 'Overall runtime: '; print: (Time millisecondClockValue - start); cr These are the times I got from Dolphin 6 for the first and second runs, times in milliseconds: Alloc time: 4116 Get (index) time: 422 Iterate (do) time: 289 Double mul time: 116 GC time: 204 Overall runtime: 5159 Alloc time: 1408 Get (index) time: 418 Iterate (do) time: 290 Double mul time: 105 GC time: 211 Overall runtime: 2441 Running it a number of times, the figures varied a bit, but I haven't bothered to average them. As you can see the first run allocation time was significantly better than your experience, I didn't know your machine spec but assumed that it must be similar since the second run results are similar. I also didn't see any extended GC time, even if I replaced the #collectGarbage with a #compact, though doing that did mean that the subsequent run figures were not much faster than the first on the allocation test. Anyway, I thought this must be something massively improved in D6 vs D5 (though I can't for the life of me think what :-)), so I went back to D5 and got these results: Alloc time: 52363 Get (index) time: 411 Iterate (do) time: 284 Double mul time: 123 GC time: 190 Overall runtime: 53375 Alloc time: 1275 Get (index) time: 414 Iterate (do) time: 288 Double mul time: 112 GC time: 186 Overall runtime: 2285 I was happy that this coincided with your experience on the initial allocation behaviour (though not that D6 was 100mS slower on the subsequent run, even though this is probably just timing variability). I was still mystified as to the delay you experienced closing the workspace, since this didn't seem to be born out by the forced GC timings (and if you insert a 'Rectangle primAllInstances size' at the end of the script, you'll see that those Rectangles really have been collected). So I thought I'd try out doing as you did, and simply closing the workspace leaving the variables to be collected at idle time. To my surprise I experienced exactly the same lengthy freeze. I didn't measure its duration, but it was lengthy. I found that if I nilled out the workspace variables before closing the workspace, that the delay did not occur, so I could only conclude that there is something very odd going on in the interaction between the view closing and activities of the garbage collector. Obviously this needs to be investigated, but I don't think it is a fundamental performance problem in the Dolphin collector, as otherwise my other tests would also have shown that. As a point of reference I tried running the script on VWNC7. I had to change the Transcript #display: messages to #show:, and use "ObjectMemory quickGC" in place of "MemoryManager current collectGarbage" (it seemed the nearest equivalent), and this is what I got. Alloc time: 40849 Get (index) time: 77 Iterate (do) time: 51 Double mul time: 327 GC time: 116 Overall runtime: 41445 [Subsequent runs were similar] As you can see, performance on the initial allocation test was poor. I think this is because I either have insufficient memory to run the test in VW, or (more likely) the default memory policy/configuration is not appropriate for this test. Certainly there was an awful lot of flashing up of the GC and dustbin cursors when the test was running. So anyway, I don't think it is really a valid result, and I also think the FP mul figure is questionable since once again this was probably over influenced by GC activity: Anyway Jochen, I believe what has brought us to this point was your statement: " Performance is about factor twenty lower than Java HotSpot VM, ..." On this test at least, that would appear be FUD, right? :-) [Frankly, though, I think you really need some more "macro" benchmarks, i.e. closer to an actual application, to draw any real performance conclusions] Regards Blair |
In reply to this post by Jochen Riekhof
Jochen Riekhof wrote:
> I did a VERY quick check on simple operations performance and here is > the result. I measured both server and client vm in hotspot and > interpreted mode vs. Dolphin. I ran essentially the same tests. A few results (I've normalised them against Java since our machines aren't the same speed) and observations: Dolphin relative to Java interpreter (low numbers are faster) alloc = 1.0 get (index) = 1.4 get (iterator) = 0.56 As you say, about the same speed, but you are not comparing like with like. The java.awt.Rectangle class has 4 integer fields. The Dolphin Rectangle class has two Points, which in turn have 2 instvars holding Integers. That affects the implementation of #top since it has to go through twice as many indirections. It also affect the allocation since creating a Rectangle creates three objects (totalling, I believe, 72 bytes), whereas the Java Rectangle is just one object (I think the current Sun JVM will normally take 24 bytes for a Rectangle). I think it's relevant to compare like with like here, so I hacked together a Rectangle2 class that used 4 instvars. Using that for the same loops: Dolphin with Rectangle2 relative to Java interpreter (low numbers are faster) alloc = 0.64 get (index) = 0.98 get (iterator) = 0.33 So Dolphin's interpreter is, as I said, pretty quick. Comparing it against the (client) hotspot JVM: Dolphin with Rectangle2 relative to Hotspot client (low numbers are faster) alloc = 1.5 get (index) = 5.7 get (iterator) = 2.2 A significant difference, but not *vast*. Actually it's less than the difference between the performances of the two machines I use regularly (I use the slower one most often). (It's also less than the difference between compiling using VC++6 and VC.NET. At least the one program I've compiled with VC.NET, same optimisation settings as VC6, produced a .exe that ran 4 times slower! <Grin>) So I come back to my point. If your code is running about ~20 times faster on Java than Dolphin, then I think much of the difference is down to the primitive types. BTW, don't forget that for floating point code, Java's float/doubles are unboxed; Dolphin's are boxed, so every floating point operation involves allocating a new (24 byte?) object on top of the actual fp arithmetic. --- chris P.S. for interest: I did compare against a "warmed up" hotspot server. For these micro- "benchmarks" the results aren't very meaningful. For instance, Hotspot server optimises away the floating point loop completely. For the other tests, FWIW, it was about double the Hotspot client speed. |
In reply to this post by Blair McGlashan
Hi Blair,
Blair McGlashan wrote: [snip] > As a point of reference I tried running the script on VWNC7. I had to change > the Transcript #display: messages to #show:, and use "ObjectMemory quickGC" > in place of "MemoryManager current collectGarbage" (it seemed the nearest > equivalent), and this is what I got. > > Alloc time: 40849 > Get (index) time: 77 > Iterate (do) time: 51 > Double mul time: 327 > GC time: 116 > Overall runtime: 41445 > > [Subsequent runs were similar] > > As you can see, performance on the initial allocation test was poor. I think > this is because I either have insufficient memory to run the test in VW, or > (more likely) the default memory policy/configuration is not appropriate for > this test. Certainly there was an awful lot of flashing up of the GC and > dustbin cursors when the test was running. So anyway, I don't think it is > really a valid result, and I also think the FP mul figure is questionable > since once again this was probably over influenced by GC activity: Yes, that's right. The default MemoryPolicy parameters out of the box are extremely poor defaults. To fix the problem is really easy though: Open The Settings tool (Launcher->System->Settings) and open the Memory Policy tab. Set Memory Upper Bound to something like the max ram on your system. Set Growth Regime Upper Bound to about 1/2 to 2/3 of the max. Check "Update Current Policy" then click "Accept". Here are the times I get from running on my venerable and trusty 400 MHz PII with 380Meg of memory with an upper bound of 256Meg and a GRUB of 170Meg I get: Alloc time: 14707 Get (index) time: 701 Iterate (do) time: 526 Double mul time: 2437 GC time: 983 Overall runtime: 19356 Alloc time: 13684 Get (index) time: 686 Iterate (do) time: 531 Double mul time: 2369 GC time: 979 Overall runtime: 18250 If I scale by the Get (index) time ratio (8.9 - the Iterate ratio is 10.4) I'd get Alloc time: 13684 / 8.9 1537.53 Get (index) time: 686 / 8.9 77.0787 Iterate (do) time: 531 / 8.9 59.6629 Double mul time: 2369 / 8.9 266.18 GC time: 979 / 8.9 110.0 Overall runtime: 18250 / 8.9 2050.56 but I doubt the memory times would scale anything like as well as the Get & Index times... -- _______________,,,^..^,,,____________________________ Eliot Miranda Smalltalk - Scene not herd |
In reply to this post by Blair McGlashan
> I was pretty surprised by this, so I thought I'd look to see why. Just
> looking at the script, something that was immediately apparent is that your > allocation test is actually allocating 3 million objects on Dolphin, vs 1 > million on Java. This is because Smalltalk Rectangles are actually > implemented as a pair of Point objects, whereas Java's is a single block of > memory holding 4 integer values. Since this is a micro-benchmark designed to > measure object allocation speed, I think it really ought to try and measure > the same number of allocations. Note though that on VW, Rectangle class>>new > answers an uninitialized Rectangle, so it is only performing 1 million > allocations, at least if we ignore the allocations needed to grow the > OrderedCollection. I noticed this when trying to run your benchmark on VW, > as it failed on the second expression when attempting to access #top of the > first Rectangle. Another point to note is that this isn't a particularly > pure test of allocation speed, as Smalltalk has to send a few messages to > initialize a Rectangle. Yep, this is all correct. For the VW to work, I also used the origin selector instead of the top selector used in Dolphin, because the instance vars were all nil. I was too tired to continue yesterday, though :-). > statement: " Performance is about factor twenty > lower than Java HotSpot VM, ..." On this test at least, that would appear be > FUD, right? :-) > [Frankly, though, I think you really need some more "macro" benchmarks, i.e. > closer to an actual application, to draw any real performance conclusions] The factor twenty is (for me) the real number, as it stems from some algorithms I ported from ST to Java without further optimizations. The first was a "windowizing" algorithm, that basically puts a large amount of small rectangles into a number of equally sized much bigger rectangles - the number of big rectangles should be minimal. This involves a lot of allocations, a lot of reordering and collection searches and iterations. This one I optimized with Ians Profiler, as it was extremly slow. It was i.e. the common missuse of sorted collection - I copied a SortedCollection and then removed/added to it. I then got about 1.3 seconds. In Java, in the final environment I got about 70ms. Unfortunately I cannot hand out the code as it is not mine. The second was on images, and invoked many byteAtOffset: calls to access pixels of bitmaps. I got comparable results - factor 20 roughly. Shurely there is no larger area of interpretation than on benchmarking, and my numbers where not as concise as they could have been. Fortunately Chris made up for this :-). Also, both the Java and the ST code can definitely be optimized (thereby making it much less maintainable and readable). I do not intent to do that, as it is (in Java) fast enough. When having the Designer hat on, I do not care about the implementation of a Rectangle class, I just use it. If it is by design slower in ST, not my problem. This is the price you pay for "everything is an object". I pay the same price the opposite way in Java, e.g. when creating tons of syntactical crap in form of wrapper classes around integers to use them as Dictionary keys. Where performance is important, the current choice IMO must be a dynamic compilation VM. Noone I know uses interpreted Java at all. There might be use for it e.g. when writing scripts that run only very short time. I will inform you of further "closer to an actual application" relations when I have to prototype something again. ...Jochen |
Free forum by Nabble | Edit this page |