Evaluating Dolphin

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
97 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Jochen Riekhof
Hi Eliot...

> Try VisualWorks Smalltalk or VisualAge Smalltalk.  These are dynmically
> compiled and hence have much higher Smalltalk compute performance than
> Dolphin.  I think you'd find that for symbolic computation VisualWorks
> was equivalent to Java in speed...

Thank you for the info. I have tried VisualWorks a bit and it is indeed much
faster. Unfortunately, as well as VA, it is quite expensive when used
commercially. Java OTOH is for free in development and deployment, and times
are hard currently. I certainly would like to propose some ST dialect for
the next project, but the step is too big, given the difference in price and
that all the developers I work with are knowing Java well, but only one (me)
knows some smalltalk.

Anyway, Dolphin has some chances in our environment, e.g. as a platform for
fat clients and maintenance apps, (the former depending on the progress in
connectivity add-ons).

I am prepared and will take any chances :-)

Ciao

...Jochen


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin [LONG]

NiallRoss
In reply to this post by Blair McGlashan
Dear Blair et al,

> ... A significant point
> about the Refactoring Engine, is that it is (like everything in the IDE
> really) extensible by the user. You an add your own custom refactorings if
> you wish, and indeed some people have:
>
>
http://wiki.cs.uiuc.edu/CampSmalltalk/Custom+Refactorings+and+Rewrite+Editor
> +Usability

This thread seems like a convenient place to remark the following.

1) If a Dolphin Smalltalker were to attend the next Camp Smalltalk (this
June in Gronau, Germany), they could help us port our work to Dolphin and
would learn the innards of the RB while doing so.

2) I've noted the suggested new refactorings in this thread;  they will be
added to the list we review to decide what to do at the next CS.  If you
have ideas for refactorings the RB needs, or have more to say on the ones
already suggested, you can add comments, or links to your pages, to our
comments page (reachable from the above) or other pages as appropriate.

> ... There is no UI
> onto this in Dolphin 5 (there will be in the next release), but the
original
> Refactoring Browser has one called the Rewrite Tool

John in VW7 appears to have deprecated the free-standing Rewrite tool in
favour of a rewrite code tool for the RB.  We have been drawn to the same
approach in our project's VA load, I conjecture for the same reasons:

 - a common approach to selecting the environment to browse and the
environment to rewrite is cleaner

 - custom searches and rewrites naturally interact with invoking RB
features;  it helps when each can take the result of the other as its point
of departure

Thus you may want to build  (already be building) a RewriteCodeTool rather
than a RewriteRuleEditor in your next release (and if so, you may also want
to consider using our project's approach to building it:
RewriteMetaCodeTool and subclasses for its panes).

> > Will Loew-Blosser had an experience report at
> OOPSLA that showed how useful they can be
http://csc.noctrl.edu/f/opdyke/OOPSLA2002/Papers/TransformDataLayer.pdf

Some examples (not requiring UI, so I assume imitable in Dolphin today) are
on our pages and in our downloads.  More will appear as time goes by.

            Yours faithfully
                Niall Ross, eXtremeMetaProgrammers
----
(My newsgroup posting address has a spam-trap, which you
must remove if replying to me, not just the newsgroup.)
----


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Steve Alan Waring
In reply to this post by mm_aa
mm_aa wrote:

> Steve,
>
>> Not sure what you mean. Are you adding tools using TTF_IDISHWND or
>> are you using a uID to keep track of the tools?
>
> I just want to move the tooltip support to move in _whole_ to the
> dedicated class (honestly saying, I don't like the smalltalk way to
> have miriads of methods in one class which do everything). That's why
> I need to dispatch TTN_NEEDTEXTA not inside the view, but inside the
> Tooltip class, which will route the necessary commands to the
> presenter.

While you could probably do this, I think you would be fighting an uphill
battle. You could experiment with View>>dispatchMessage:wParam:lParam, but I
would definitely make sure your image was saved. Blair or Andy may have a
better way of achieving this, but I dont know of any.

To experiment, I moved your class to be a subclass of ControlView and added
the class method;

  winClassName
      ^'tooltips_class32'

I changed the name of #createWindow to #basicCreateAt:extent: and removed
the interactor protocol.

I created a Shell with a single PushButton named: 'myButton', and used the
following workspace:

   "Create the Shell and the Tooltip view"
   myShell := MMAATooltipShell show.
   myTooltip := Tooltip new create.
   myTooltip setWindowPosition
   "Install the Shell's child view as a tool"
   myTooltip install: (myShell view viewNamed: 'myButton')
   "clean up"
   myTooltip destroy

To get it to work, I enabled the pushButton's command in myShell, and added
a #onTipTextRequired: to myShell.

I know this is not what you are after, but it is the approach I would take.
Depending on what kind of Views you want to add as tools, you may need to do
some work intercepting the #wmNotify:wParam:lParam, like the Toolbar class,
and my modifications to PushButton.

Hope this helps,
Steve

--
Steve Waring
Email: [hidden email]
Journal: http://www.stevewaring.net/blog/home/index.html


Reply | Threaded
Open this post in threaded view
|

Re: Couple of small bugs (was: Evaluating Dolphin)

John Brant
In reply to this post by Chris Uppal-3
"Chris Uppal" <[hidden email]> wrote in message
news:3e56075b$0$9695$[hidden email]...

> Blair,
>
> > !MethodBrowser methodsFor!
> >
> > widenSourceSelection
> >  | node |
> >  node := self selectedNode.
> >  node isNil
> >   ifTrue:
> >    ["Normally we'd just disable the command, but to avoid patch to
> > #queryCommand: ..."
>
> This is nice.
>
> The code itself seems to expose a bug in the reformatter code, though.
Doing a
> ctrl+shift+s left me with two copies of the comment, one in the original
place,
> the other after the surrounding block.

It appears that some methods got lost between VW and Dolphin. If you add
these two methods, I believe it will fix your problem:

RBBlockNode>>statementComments
 ^self comments

RBCascadeNode>>statementComments
 | statementComments |
 statementComments := OrderedCollection withAll: self comments.
 statementComments addAll: messages first receiver statementComments.
 messages do:
   [:each |
   each arguments
    do: [:arg | statementComments addAll: arg statementComments]].
 ^statementComments asSortedCollection: [:a :b | a first < b first]


John Brant


Reply | Threaded
Open this post in threaded view
|

Re: Couple of small bugs (was: Evaluating Dolphin)

Chris Uppal-3
John Brant wrote:

> It appears that some methods got lost between VW and Dolphin. If you
> add these two methods, I believe it will fix your problem:

That seems to work (though the class names begin with St, rather than RB, in
the Dolphin context).

Ta.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Blair McGlashan
In reply to this post by Eliot Miranda
"Eliot Miranda" <[hidden email]> wrote in message
news:[hidden email]...
>
>
> Jochen Riekhof wrote:
> >
> > My current main use of Dolphin is to prototype all sorts of algorithms
e.g.
> > image processing and numerics. The image-based and interpreted apoproach
of
> > Smalltalk is ideal for this. My experience so far is that the average
speed
> > of execution of the algorithms is about twenty times faster when ported
to
> > java (No, I do NOT optimize the Java code and write dumb ST code, but
rather
> > profile the ST code with Ians great Profiler and usually do no more
> > optimizing on Java side).
> >
> > The difference is that in Dolphin code is interpreted, while Java
compiles
> > down to native code.
>
> Try VisualWorks Smalltalk or VisualAge Smalltalk.  These are dynmically
> compiled and hence have much higher Smalltalk compute performance than
> Dolphin.

Jochen is referring primarily to numeric processing, and ViualWorks
performance on that is not "much higher", in fact it is barely higher at
all. Dolphin's fundamental numeric primitives (especially LargeInteger and
Floating Point) are much faster than in VW (I don't know about VA), and
judging from micro-benchmarks this seems to make up for much of the speed
difference in basic computational performance.

Since Java is a hybrid language with native value types for numerics, it is
much easier to achieve near C speeds for numerics. Maybe with the work you
are doing on adaptive inlining we will see that kind of capability in
Smalltalk (I look forward to it), but in the meantime if one really wants to
do high-performance numeric computation in Smalltalk, then the only
realistic option would appear to be Smalltalk MT.

>...I think you'd find that for symbolic computation VisualWorks
> was equivalent to Java in speed...

As fast as Hotspot? Can you prove back up that assertion?

Regards

Blair


Reply | Threaded
Open this post in threaded view
|

Re: Couple of small bugs (was: Evaluating Dolphin)

Jochen Riekhof-3
In reply to this post by Chris Uppal-3
> The code itself seems to expose a bug in the reformatter code, though.
Doing a
> ctrl+shift+s left me with two copies of the comment, one in the original
place,
> the other after the surrounding block.

Hum, I do not experience this! How can this be?

Ciao

...Jochen


Reply | Threaded
Open this post in threaded view
|

Re: Couple of small bugs (was: Evaluating Dolphin)

Chris Uppal-3
Jochen Riekhof wrote:

> Hum, I do not experience this! How can this be?

Are your formatter settings the same as the ones I posted ?

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Jochen Riekhof-3
In reply to this post by Blair McGlashan
I am not shure, but my belief is that the overhead comes from a lot of
method calls necessary to access collection elements.
Arrays in Java are native (well, they inherit from Object, but are
nevertheless treated "specially" by the VM), so these are extremly fats. I
use ArrayList most of the time, however,  but the ArrayList accessors
(comparable to OrderedCollection) are probably inlined quickly by the
HotSpot VM as they are called a lot. At least they are compiled almost
instantly.

BTW: I just tried on some code I currently have at hand to switch off
compilation, thereby using interpretation only like in Dolphin. Times are
about 10 times slower compared to hotspot execution. Unfortunately I did not
prototype this particular one in Dolphin, so I can't compare directly, but
it is likely that the differences for interpreted only code are only factor
two between dolphin and java.   As this particular sample uses primitive
arrays a lot that ST is missing it is probably only a slim if any difference
in code without extensive array usage.

However, in current CPUs compilation apparently allows for huge
improvements.

Ciao

...Jochen


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Chris Uppal-3
Jochen Riekhof wrote:
> I am not shure, but my belief is that the overhead comes from a lot of
> method calls necessary to access collection elements.
> Arrays in Java are native (well, they inherit from Object, but are
> nevertheless treated "specially" by the VM), so these are extremly
> fats. I use ArrayList most of the time, however,  but the ArrayList
> accessors (comparable to OrderedCollection) are probably inlined
> quickly by the HotSpot VM as they are called a lot. At least they are
> compiled almost instantly.

That doesn't sound right.  If you are doing numerical and/or image processing
work then you'll be using Java's primitive types.  Anything else would be
suicide for performance.  But you can't put primitive types into an ArrayList.
So I suspect that the inner loops of your code (the only bits that matter) are
all using Java native arrays holding primitive types.  If that's the case then
I'd expect a difference of less than an order of magnitude between JITed Java
and Dolphin for *integer* arithmetic and integer arrays.  The difference is
huge for floating point, though. (Presumably because of Smalltalk's "boxed"
floats.)  If you are seeing a 20-to-1 difference then I'd guess that nearly all
of it is down to floating-point performance.

BTW, as interpreters go, Dolphin is fast.  It should beat a JVM running in
interpretted mode easily for almost everything.  The only exception would be
floating point arithmetic, where (because of the boxing again) I'd expect Java
to be about twice as fast.

    -- chris

P.S.  Mind you, the last time I actually *measured* any of this stuff was back
in the days of D3 and JDK1.3 -- and on a now-obsolete machine...


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Jochen Riekhof-3
> That doesn't sound right.  If you are doing numerical and/or image
processing
> work then you'll be using Java's primitive types.

I never said I use primitive types, it was Blairs guess :-).

As I said, I am iterating a lot over array lists with iterators, and indeed
do most calculations on floats/doubles that are part of the contents.

> If that's the case then
> I'd expect a difference of less than an order of magnitude between JITed
Java
> and Dolphin for *integer* arithmetic and integer arrays.  The difference
is
> huge for floating point, though. (Presumably because of Smalltalk's
"boxed"
> floats.)  If you are seeing a 20-to-1 difference then I'd guess that
nearly all
> of it is down to floating-point performance.

This may well be.

Ciao

...Jochen


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Jochen Riekhof
> > If that's the case then I'd expect a difference of less than
> > an order of magnitude between JITed Java and Dolphin
> >  for *integer* arithmetic and integer arrays.  The difference
> > is huge for floating point, though. (Presumably because of Smalltalk's
> > "boxed" floats.)  If you are seeing a 20-to-1 difference then I'd guess
> > that nearly all of it is down to floating-point performance.
>
> This may well be.

No, may not be :-).

I did a VERY quick check on simple operations performance and here is the
result. I measured both server and client vm in hotspot and interpreted mode
vs. Dolphin. All code is appended. That the sever VM seems to be slow is
that it does not have enough time to "wwarm up". It does a lot of background
analysis and compilation that never pays off because the app runs only a few
seconds at all. You can expect the server vm to be faster than the client vm
after a few minutes.

There is not a BIG difference in interpreted mode vs. Dolphin, except
iterators are half the speed of a do: operation.
Hotspot any version is much faster, however. Float is only about factor two
slower in dolphin as opposed to your 1 to 20 guess.

The most striking difference came from the memory management. Dolphin needed
more than 30 seconds to allocated the one million Rectangle objects. After
close of the workspace the env. freezed foar about 45 seconds for gc (I
guess).
Java vm forced full gc reported [Full GC 28108K->741K(51468K), 0.0321105
secs].
Meaning: used mem went down from  28108K to 741K, total heap minus one
surrender space (copy target for short term copy gc, typically small) needed
0.0321105 secs.

However. When evaluating the workspace again (overwriting the oc variable)
the memory for the Rectangles where apparently reused very effectively. It
should also be noted, that the Java VM never gives back any memory to the
OS.
Blair, does Dolphin do this?

Also, as opposed to the Java example (apparently not executable from a
workspace ;-) I used a workspace for the Dolphin test. If this is slower
than in a class, please tell me.

My conclusion is now, that the reason for my 1 to 20 ratio is probably
mainly alloc and gc activity.

Ciao

...Jochen

P.S. you can enable server VM with the -server commandline flag, -Xint
forces interpreted mode, and finally -verbose:gc prints garbage collection
information on the console.

--- The numbers ------------------------------------
Dolphin
time needed alloc (first time) = 36219  !!
time needed alloc = 1874
time needed get (index) = 481
time needed get (iterator) = 383
time needed double mul = 131
time needed (gc): about 45000 !!

java server vm hotspot
time needed alloc = 1438
time needed get (index) = 47
time needed get (iterator) = 63
time needed double mul = 31

java client vm hotspot
time needed alloc = 984
time needed get (index) = 63
time needed get (iterator) = 109
time needed double mul = 16

java server vm interpreted
time needed alloc = 2015
time needed get (index) = 407
time needed get (iterator) = 922
time needed double mul = 78

java client vm interpreted
time needed alloc = 1984
time needed get (index) = 391
time needed get (iterator) = 859
time needed double mul = 47

---Dolphin code (workspace)----------------------
Time millisecondsToRun: [
oc := OrderedCollection new.
1 to: 1000000 do: [:each | oc add: Rectangle new]].
Time millisecondsToRun: [1 to: 1000000 do: [:each | (oc at: each) top]].
Time millisecondsToRun: [oc do: [:each | each top]].
Time millisecondsToRun: [s := 1.00000001. 1 to: 1000000 do: [:each | s := s
* 1.00000001.]].

---Java code--------------------------------------
   public static void main(String[] args) throws Exception {
          ArrayList al = new ArrayList();
          long t = System.currentTimeMillis();
          for (int i = 0; i < 1000000; i++)
                al.add(new Rectangle());
            System.out.println("time needed alloc = " +
(System.currentTimeMillis() -t));
        t = System.currentTimeMillis();
        for (int i = 0; i < al.size(); i++)
              ((Rectangle) al.get(i)).getWidth();
          System.out.println("time needed get (index) = " +
(System.currentTimeMillis() -t));
        t = System.currentTimeMillis();
        for (Iterator iter = al.iterator(); iter.hasNext(); )
              ((Rectangle) iter.next()).getWidth();
          System.out.println("time needed get (iterator) = " +
(System.currentTimeMillis() -t));
        t = System.currentTimeMillis();
        double s = 1.00000001;
        for (int i = 0; i < 1000000; i++)
            s = s * 1.00000001;
          System.out.println("time needed double mul = " +
(System.currentTimeMillis() -t));
        System.gc();
            Thread.sleep(1000);    //wait for gc to complete
}


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Jochen Riekhof
add-on measurements for VW 7 nc:
> time needed alloc = 6685
> time needed get (index) = 62
> time needed get (iterator) = 38
> time needed double mul = 145
time needed (gc):
    Global garbage collection (please wait)...
    reclaimed 27.86 Mbytes of data and 0 OTEntries in 0.2 sec.
    heap shrunk by 21.99 Mbytes
    18.79 Mbytes total; 11.37 Mbytes used, 7.42 Mbytes free.

There were no differences between allocs in VW.
The float operations are indeed the same speed as dolphin. The other
operations are comparable to hot spot vm, even somewhat faster (as Niall
Ross pointed out).

Ciao

...Jochen


Reply | Threaded
Open this post in threaded view
|

Re: Couple of small bugs (was: Evaluating Dolphin)

Blair McGlashan
In reply to this post by John Brant
"John Brant" <[hidden email]> wrote in message
news:MB76a.209924$iG3.24082@sccrnsc02...
> ...
> It appears that some methods got lost between VW and Dolphin. If you add
> these two methods, I believe it will fix your problem:
> ...

Thanks John (and Chris for reporting it).

Regards

Blair


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin [LONG]

Blair McGlashan
In reply to this post by NiallRoss
Niall

You wrote in message news:[hidden email]...
>
> > ... A significant point
> > about the Refactoring Engine, is that it is (like everything in the IDE
> > really) extensible by the user. You an add your own custom refactorings
if
> > you wish, and indeed some people have:
> >
> >
>
http://wiki.cs.uiuc.edu/CampSmalltalk/Custom+Refactorings+and+Rewrite+Editor
> > +Usability
>
> This thread seems like a convenient place to remark the following.
>
> 1) If a Dolphin Smalltalker were to attend the next Camp Smalltalk (this
> June in Gronau, Germany), they could help us port our work to Dolphin and
> would learn the innards of the RB while doing so.

I'd imagine that if the code were available in chunk format, rather than
only Envy .dat files (is that right?), then it could be ported over before
then, allowing work on some new refactorings at CS6 :-). Actually I'd really
like to have the 'Rename Variable and Accessors' refactoring, so I would
port that over myself.

> 2) I've noted the suggested new refactorings in this thread;  they will be
> added to the list we review to decide what to do at the next CS.  If you
> have ideas for refactorings the RB needs, or have more to say on the ones
> already suggested, you can add comments, or links to your pages, to our
> comments page (reachable from the above) or other pages as appropriate.

I'll do that, but some I'd like to see are:
1) Extract a constant to a class variable. This would add a class variable,
introduce or modify a class initialize method to assign the constant to the
variable, and then replace all references to the constant with the class
variable.
2) Convert a boolean instance variable to a flag in a shared flags instance
variable. Needs to introduce and initialize a class variable for the mask,
and then create/modify accessors to do the necessary masking.
3) "Extract with holes" (as Don called it when I described it to him). This
is a version of Extract Method that takes, in addition to the overall source
interval to extract, a collection of intervals to exclude. The idea is to be
able to extract a method leaving behind some of the parameter expressions.
At the moment I have to do this by first extracting to temporaries all the
parameter expressions I want to retain in the source method, then doing the
extract method, and then inlining the temps again. I think a reasonable UI
onto this could be created by using a subsidiary dialog to build up the list
of excluded areas, since most text editors don't support selection of
multiple disjoint ranges (unfortunately).

>
> > ... There is no UI
> > onto this in Dolphin 5 (there will be in the next release), but the
> original
> > Refactoring Browser has one called the Rewrite Tool
>
> John in VW7 appears to have deprecated the free-standing Rewrite tool in
> favour of a rewrite code tool for the RB.  We have been drawn to the same
> approach in our project's VA load, I conjecture for the same reasons:
>
>  - a common approach to selecting the environment to browse and the
> environment to rewrite is cleaner
>
>  - custom searches and rewrites naturally interact with invoking RB
> features;  it helps when each can take the result of the other as its
point
> of departure
>
> Thus you may want to build  (already be building) a RewriteCodeTool rather
> than a RewriteRuleEditor in your next release (and if so, you may also
want
> to consider using our project's approach to building it:
> RewriteMetaCodeTool and subclasses for its panes).

We don't use the RB as such, but have instead taking the approach of
integrating the refactoring support into our native browsers (and Debugger).
Dolphin's CodeMentor (SmallLint) and CodeRewriter are browser plugins. These
work against an environment created based on the current selection in the
browsers.

>...

Regards

Blair


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Chris Uppal-3
In reply to this post by Jochen Riekhof
Jochen Riekhof wrote:

> time needed alloc (first time) = 36219  !!
> time needed alloc = 1874

I see the same effect.  I think there's something very screwy going on.  Either
a bug or a most unfortunate interaction with the OS.

Blair, the following was all done on a Win2K laptop with 256 Mbytes.  No paging
activity at any time.

The first oddity is plotting allocation speed against number allocated.  It
follows a very odd pattern:

Up to about 560K Rectangles allocated, Dolphin's taking around 40msec to
allocate 100K rects.  The number grows slowly (presumably O(n^2) with low
constants, but it's too irregular to tell).

 From 570K to 590K the rate decreases sharply to about 10sec/100K.   (I know
that sounds like thrashing, but it wasn't the hard disk -- my laptop has a
*very* noisy disk, so I'm sure of it.)

 From 600K to the million mark, the rate decreases slowly and linearly (I
plotted the histogram and it looks very linear) from 10sec/100K to 13sec/100K.

In all, on my machine, it takes nearly 8 minutes to allocate 1 million
Rectangles the first time.  Freeing them and then re-runing the loop takes just
4 seconds.

The second, and stranger, oddity is this: restart Dolphin, and then execute:

    size := 1000000.
    oc := OrderedCollection new: size.
    1 to: size do:
           [:i |
           i = 685902 ifTrue: [self halt].
           oc add: Rectangle new].

which halts after about 1/4 of the expected execution time.  When the
breakpoint hits, go into the debugger. spend a couple of seconds looking
around, then resume.  It isn't perfectly reproducible but usually the loop will
then complete in almost no time.  The 685902 number has no special magic about
it except that it does need to be up around 700K.  I'm not sure, but I get the
impression that just resuming from the breakpoint prompt, or resuming from the
debugger very quickly, fails to show the odd effect.

I did wonder if bringing the debugger up was causing Dolphin to allocate a new
chunk of memory that it could then recycle for the last 300K Rectangles, but,
according to task manager, the memory footprint didn't increase until *after*
I'd dismissed the debugger.

Puzzles me...

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Blair McGlashan
In reply to this post by Jochen Riekhof
"Jochen Riekhof" <[hidden email]> wrote in message
news:[hidden email]...
> > > If that's the case then I'd expect a difference of less than
> > > an order of magnitude between JITed Java and Dolphin
> > >  for *integer* arithmetic and integer arrays.  The difference
> > > is huge for floating point, though. (Presumably because of Smalltalk's
> > > "boxed" floats.)  If you are seeing a 20-to-1 difference then I'd
guess
> > > that nearly all of it is down to floating-point performance.
> >
> > This may well be.
>
> No, may not be :-).
>
> I did a VERY quick check on simple operations performance and here is the
> result. I measured both server and client vm in hotspot and interpreted
mode
> vs. Dolphin. All code is appended. That the sever VM seems to be slow is
> that it does not have enough time to "wwarm up". It does a lot of
background
> analysis and compilation that never pays off because the app runs only a
few
> seconds at all. You can expect the server vm to be faster than the client
vm
> after a few minutes.
>
> There is not a BIG difference in interpreted mode vs. Dolphin, except
> iterators are half the speed of a do: operation.
> Hotspot any version is much faster, however. Float is only about factor
two
> slower in dolphin as opposed to your 1 to 20 guess.
>
> The most striking difference came from the memory management. Dolphin
needed
> more than 30 seconds to allocated the one million Rectangle objects. After
> close of the workspace the env. freezed foar about 45 seconds for gc (I
> guess).
>...

I was pretty surprised by this, so I thought I'd look to see why. Just
looking at the script, something that was immediately apparent is that your
allocation test is actually allocating 3 million objects on Dolphin, vs 1
million on Java. This is because Smalltalk Rectangles are actually
implemented as a pair of Point objects, whereas Java's is a single block of
memory holding 4 integer values. Since this is a micro-benchmark designed to
measure object allocation speed, I think it really ought to try and measure
the same number of allocations. Note though that on VW, Rectangle class>>new
answers an uninitialized Rectangle, so it is only performing 1 million
allocations, at least if we ignore the allocations needed to grow the
OrderedCollection. I noticed this when trying to run your benchmark on VW,
as it failed on the second expression when attempting to access #top of the
first Rectangle. Another point to note is that this isn't a particularly
pure test of allocation speed, as Smalltalk has to send a few messages to
initialize a Rectangle.

Anyway, regardless of this, I tried out the following slight modification of
your script on the 2.2Ghz P4 Xeon with 512Mb I happened to be using:

start := Time millisecondClockValue.
Transcript display: 'Alloc time: '; print: (Time millisecondsToRun: [
oc := OrderedCollection new.
"Use #origin:corner: so will also run on VW and  - note this actually
allocates 3 million objects"
1 to: 1000000 do: [:each | oc add: (Rectangle origin: 0@0 corner: 0@0)]]);
cr.
Transcript display: 'Get (index) time: '; print: (Time millisecondsToRun: [1
to: 1000000 do: [:each | (oc at: each) top]]); cr.
Transcript display: 'Iterate (do) time: '; print: (Time millisecondsToRun:
[oc do: [:each | each top]]); cr.
Transcript display: 'Double mul time: '; print: (Time millisecondsToRun: [s
:= 1.00000001. 1 to: 1000000 do: [:each | s := s * 1.00000001.]]); cr.
Transcript display: 'GC time: '; print: (Time millisecondsToRun: [oc := s :=
nil. MemoryManager current collectGarbage "or ObjectMemory quickGC on VW"]);
cr.
Transcript display: 'Overall runtime: '; print: (Time
millisecondClockValue - start); cr

These are the times I got from Dolphin 6 for the first and second runs,
times in milliseconds:

Alloc time: 4116
Get (index) time: 422
Iterate (do) time: 289
Double mul time: 116
GC time: 204
Overall runtime: 5159

Alloc time: 1408
Get (index) time: 418
Iterate (do) time: 290
Double mul time: 105
GC time: 211
Overall runtime: 2441

Running it a number of times, the figures varied a bit, but I haven't
bothered to average them.

As you can see the first run allocation time was significantly better than
your experience, I didn't know your machine spec but assumed that it must be
similar since the second run results are similar. I also didn't see any
extended GC time, even if I replaced the #collectGarbage with a #compact,
though doing that did mean that the subsequent run figures were not much
faster than the first on the allocation test. Anyway, I thought this must be
something massively improved in D6 vs D5 (though I can't for the life of me
think what :-)), so I went back to D5 and got these results:

Alloc time: 52363
Get (index) time: 411
Iterate (do) time: 284
Double mul time: 123
GC time: 190
Overall runtime: 53375

Alloc time: 1275
Get (index) time: 414
Iterate (do) time: 288
Double mul time: 112
GC time: 186
Overall runtime: 2285

I was happy that this coincided with your experience on the initial
allocation behaviour (though not that D6 was 100mS slower on the subsequent
run, even though this is probably just timing variability).

I was still mystified as to the delay you experienced closing the workspace,
since this didn't seem to be born out by the forced GC timings (and if you
insert a 'Rectangle primAllInstances size' at the end of the script, you'll
see that those Rectangles really have been collected). So I thought I'd try
out doing as you did, and simply closing the workspace leaving the variables
to be collected at idle time. To my surprise I experienced exactly the same
lengthy freeze. I didn't measure its duration, but it was lengthy. I found
that if I nilled out the workspace variables before closing the workspace,
that the delay did not occur, so I could only conclude that there is
something very odd going on in the interaction between the view closing and
activities of the garbage collector. Obviously this needs to be
investigated, but I don't think it is a fundamental performance problem in
the Dolphin collector, as otherwise my other tests would also have shown
that.

As a point of reference I tried running the script on VWNC7. I had to change
the Transcript #display: messages to #show:, and use "ObjectMemory quickGC"
in place of "MemoryManager current collectGarbage" (it seemed the nearest
equivalent), and this is what I got.

Alloc time: 40849
Get (index) time: 77
Iterate (do) time: 51
Double mul time: 327
GC time: 116
Overall runtime: 41445

[Subsequent runs were similar]

As you can see, performance on the initial allocation test was poor. I think
this is because I either have insufficient memory to run the test in VW, or
(more likely) the default memory policy/configuration is not appropriate for
this test. Certainly there was an awful lot of flashing up of the GC and
dustbin cursors when the test was running. So anyway, I don't think it is
really a valid result, and I also think the FP mul figure is questionable
since once again this was probably over influenced by GC activity:

Anyway Jochen, I believe what has brought us to this point was your
statement: " Performance is about factor twenty
lower than Java HotSpot VM, ..." On this test at least, that would appear be
FUD, right? :-)

[Frankly, though, I think you really need some more "macro" benchmarks, i.e.
closer to an actual application, to draw any real performance conclusions]

Regards

Blair


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Chris Uppal-3
In reply to this post by Jochen Riekhof
Jochen Riekhof wrote:

> I did a VERY quick check on simple operations performance and here is
> the result. I measured both server and client vm in hotspot and
> interpreted mode vs. Dolphin.

I ran essentially the same tests.  A few results (I've normalised them against
Java since our machines aren't the same speed) and observations:

Dolphin relative to Java interpreter (low numbers are faster)
    alloc = 1.0
    get (index) = 1.4
    get (iterator) = 0.56

As you say, about the same speed, but you are not comparing like with like.
The java.awt.Rectangle class has 4 integer fields.  The Dolphin Rectangle class
has two Points, which in turn have 2 instvars holding Integers.  That affects
the implementation of #top since it has to go through twice as many
indirections.  It also affect the allocation since creating a Rectangle creates
three objects (totalling, I believe, 72 bytes), whereas the Java Rectangle is
just one object (I think the current Sun JVM will normally take 24 bytes for a
Rectangle).  I think it's relevant to compare like with like here, so I hacked
together a Rectangle2 class that used 4 instvars.  Using that for the same
loops:

Dolphin with Rectangle2 relative to Java interpreter (low numbers are faster)
    alloc = 0.64
    get (index) = 0.98
    get (iterator) = 0.33

So Dolphin's interpreter is, as I said, pretty quick.  Comparing it against the
(client) hotspot JVM:

Dolphin with Rectangle2 relative to Hotspot client (low numbers are faster)
    alloc = 1.5
    get (index) = 5.7
    get (iterator) = 2.2

A significant difference, but not *vast*.  Actually it's less than the
difference between the performances of the two machines I use regularly (I use
the slower one most often).

(It's also less than the difference between compiling using VC++6 and VC.NET.
At least the one program I've compiled with VC.NET, same optimisation settings
as VC6, produced a .exe that ran 4 times slower!  <Grin>)

So I come back to my point.  If your code is running about ~20 times faster on
Java than Dolphin, then I think much of the difference is down to the primitive
types.

BTW, don't forget that for floating point code, Java's float/doubles are
unboxed; Dolphin's are boxed, so every floating point operation involves
allocating a new (24 byte?) object on top of the actual fp arithmetic.

    --- chris

P.S. for interest: I did compare against a "warmed up" hotspot server.  For
these micro- "benchmarks" the results aren't very meaningful.  For instance,
Hotspot server optimises away the floating point loop completely.  For the
other tests, FWIW, it was about double the Hotspot client speed.


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Eliot Miranda
In reply to this post by Blair McGlashan
Hi Blair,

Blair McGlashan wrote:
[snip]

> As a point of reference I tried running the script on VWNC7. I had to change
> the Transcript #display: messages to #show:, and use "ObjectMemory quickGC"
> in place of "MemoryManager current collectGarbage" (it seemed the nearest
> equivalent), and this is what I got.
>
> Alloc time: 40849
> Get (index) time: 77
> Iterate (do) time: 51
> Double mul time: 327
> GC time: 116
> Overall runtime: 41445
>
> [Subsequent runs were similar]
>
> As you can see, performance on the initial allocation test was poor. I think
> this is because I either have insufficient memory to run the test in VW, or
> (more likely) the default memory policy/configuration is not appropriate for
> this test. Certainly there was an awful lot of flashing up of the GC and
> dustbin cursors when the test was running. So anyway, I don't think it is
> really a valid result, and I also think the FP mul figure is questionable
> since once again this was probably over influenced by GC activity:

Yes, that's right.  The default MemoryPolicy parameters out of the box
are extremely poor defaults.  To fix the problem is really easy though:

Open The Settings tool (Launcher->System->Settings) and open the Memory
Policy tab.
Set Memory Upper Bound to something like the max ram on your system.
Set Growth Regime Upper Bound to about 1/2 to 2/3 of the max.  Check
"Update Current Policy" then click "Accept".

Here are the times I get from running on my venerable and trusty 400 MHz
PII with 380Meg of memory with an upper bound of 256Meg and a GRUB of
170Meg I get:

Alloc time: 14707
Get (index) time: 701
Iterate (do) time: 526
Double mul time: 2437
GC time: 983
Overall runtime: 19356

Alloc time: 13684
Get (index) time: 686
Iterate (do) time: 531
Double mul time: 2369
GC time: 979
Overall runtime: 18250

If I scale by the Get (index) time ratio (8.9 - the Iterate ratio is
10.4) I'd get

Alloc time: 13684 / 8.9 1537.53
Get (index) time: 686 / 8.9 77.0787
Iterate (do) time: 531 / 8.9 59.6629
Double mul time: 2369 / 8.9 266.18
GC time: 979 / 8.9 110.0
Overall runtime: 18250 / 8.9 2050.56

but I doubt the memory times would scale anything like as well as the
Get & Index times...
--
_______________,,,^..^,,,____________________________
Eliot Miranda              Smalltalk - Scene not herd


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Jochen Riekhof
In reply to this post by Blair McGlashan
> I was pretty surprised by this, so I thought I'd look to see why. Just
> looking at the script, something that was immediately apparent is that
your
> allocation test is actually allocating 3 million objects on Dolphin, vs 1
> million on Java. This is because Smalltalk Rectangles are actually
> implemented as a pair of Point objects, whereas Java's is a single block
of
> memory holding 4 integer values. Since this is a micro-benchmark designed
to
> measure object allocation speed, I think it really ought to try and
measure
> the same number of allocations. Note though that on VW, Rectangle
class>>new
> answers an uninitialized Rectangle, so it is only performing 1 million
> allocations, at least if we ignore the allocations needed to grow the
> OrderedCollection. I noticed this when trying to run your benchmark on VW,
> as it failed on the second expression when attempting to access #top of
the
> first Rectangle. Another point to note is that this isn't a particularly
> pure test of allocation speed, as Smalltalk has to send a few messages to
> initialize a Rectangle.

Yep, this is all correct. For the VW to work, I also used the origin
selector instead of the top selector used in Dolphin, because the instance
vars were all nil. I was too tired to continue yesterday, though :-).

> statement: " Performance is about factor twenty
> lower than Java HotSpot VM, ..." On this test at least, that would appear
be
> FUD, right? :-)
> [Frankly, though, I think you really need some more "macro" benchmarks,
i.e.
> closer to an actual application, to draw any real performance conclusions]

The factor twenty is (for me) the real number, as it stems from some
algorithms I ported from ST to Java without further optimizations. The first
was a "windowizing" algorithm, that basically puts a large amount of small
rectangles into a number of equally sized much bigger rectangles - the
number of big rectangles should be minimal.
This involves a lot of allocations, a lot of reordering and collection
searches and iterations.
This one I optimized with Ians Profiler, as it was extremly slow. It was
i.e. the common missuse of sorted collection - I copied a SortedCollection
and then removed/added to it. I then got about 1.3 seconds. In Java, in the
final environment I got about 70ms. Unfortunately I cannot hand out the code
as it is not mine.

The second was on images, and invoked many byteAtOffset: calls to access
pixels of bitmaps.
I got comparable results - factor 20 roughly.

 Shurely there is no larger area of interpretation than on benchmarking, and
my numbers where not as concise as they could have been. Fortunately Chris
made up for this :-). Also, both the Java and the ST code can definitely be
optimized (thereby making it much less maintainable and readable). I do not
intent to do that, as it is (in Java) fast enough. When having the Designer
hat on, I do not care about the implementation of a Rectangle class, I just
use it. If it is by design slower in ST, not my problem. This is the price
you pay for "everything is an object". I pay the same price the opposite way
in Java, e.g. when creating tons of syntactical crap in form of wrapper
classes around integers to use them as Dictionary keys. Where performance is
important, the current choice IMO must be a dynamic compilation VM. Noone I
know uses interpreted Java at all. There might be use for it e.g. when
writing scripts that run only very short time.

I will inform you of further "closer to an actual application" relations
when I have to prototype something again.



...Jochen


12345