Evaluating Dolphin

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
97 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin [LONG]

Joseph Pelrine-2
Blair, Niall

Blair McGlashan wrote:
Niall

You wrote in message <a class="moz-txt-link-freetext" href="news:3e57b5ad@news.totallyobjects.com">news:3e57b5ad@......
... A significant point
about the Refactoring Engine, is that it is (like everything in the IDE
really) extensible by the user. You an add your own custom refactorings
if
you wish, and indeed some people have:


http://wiki.cs.uiuc.edu/CampSmalltalk/Custom+Refactorings+and+Rewrite+Editor
+Usability
This thread seems like a convenient place to remark the following.

1) If a Dolphin Smalltalker were to attend the next Camp Smalltalk (this
June in Gronau, Germany), they could help us port our work to Dolphin and
would learn the innards of the RB while doing so.

I'd imagine that if the code were available in chunk format, rather than
only Envy .dat files (is that right?), then it could be ported over before
then, allowing work on some new refactorings at CS6 :-). Actually I'd really
like to have the 'Rename Variable and Accessors' refactoring, so I would
port that over myself.
If you use Rosetta (http://www.metaprog.com/Rosetta), you can get the code over in .pac format now...

Cheers
-- 
--
Joseph Pelrine [ | ]
MetaProg GmbH
Email: [hidden email]
Web:   http://www.metaprog.com

"If you don't live on the edge, you're taking up too much space" -
Doug Robinson

Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Jochen Riekhof
In reply to this post by Chris Uppal-3
> So I come back to my point.  If your code is running about ~20 times
faster on
> Java than Dolphin, then I think much of the difference is down to the
primitive
> types.

No, I don't use them much. I think it is the message sends. In this small
"benchmark" it is less apparent than in real world ST
apps, where everything you do is a message send.

>    alloc = 1.5
>    get (index) = 5.7
>    get (iterator) = 2.2
>A significant difference, but not *vast*.
> ...
> P.S. for interest: I did compare against a "warmed up" hotspot server.
> ...
> For the other tests, FWIW, it was about double the Hotspot client speed.

So you have a factor of five to ten here. This IS vast, at least to me.
Of course this depends on the application, but for the actual project I
participate we would have choosen C++ if the Java VM where only 2 times
slower than it is. We have to compete with C software in the market..

Here is a short C++ test I did (VC6 on 2GHz P4 with 512MB).
Alloc = 1015
Access (index st style) = 63
Access (index java style) = 47
double mul = 16

Compare to hotspot server/client
time needed alloc = 1438/984
time needed get (index) = 47/63
double mul = 31/16

Apparently, at least in this micro benchmark, there is not much difference
between C++ and Java anymore (unless one uses C in C++).


Ciao

...Jochen


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Jochen Riekhof
In reply to this post by Eliot Miranda
> Alloc time: 13684 / 8.9 1537.53
> Get (index) time: 686 / 8.9 77.0787
> Iterate (do) time: 531 / 8.9 59.6629
> Double mul time: 2369 / 8.9 266.18
> GC time: 979 / 8.9 110.0
> Overall runtime: 18250 / 8.9 2050.56
>
> but I doubt the memory times would scale anything like as well as the
> Get & Index times...

The iterate times are very nice. The double mul time is showing probably the
boxed-float effect Chris has mentioned.

Ciao

...Jochen


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Blair McGlashan
In reply to this post by Jochen Riekhof
"Jochen Riekhof" <[hidden email]> wrote in message
news:3e5ba126$[hidden email]...
> [re: Rectangle new]
> Yep, this is all correct. For the VW to work, I also used the origin
> selector instead of the top selector used in Dolphin, because the instance
> vars were all nil. I was too tired to continue yesterday, though :-).

OK, so in that case you really should have rerun equivalent tests on Dolphin
to give a fair comparison. You should use #basicNew to create an
uninitialized Rectangle on both, and the #origin selector on both.

For the record, when I do this on D5 I get a repeatable timing on the alloc
test of ~1240 for the first run, and ~670mS on the second and subsequent
runs. On VW7 "out of the box" I get a wide range of times between 1354mS and
5690mS. However, I think this is probably not a fair result. Had you
reconfigured the memory policy on VW at all when you ran the test?

Using #origin, which is just an accessor unlike #top which does computation,
speeds up the two iteration tests by 30% or more.

>
> > statement: " Performance is about factor twenty
> > lower than Java HotSpot VM, ..." On this test at least, that would
appear
> be
> > FUD, right? :-)
> > [Frankly, though, I think you really need some more "macro" benchmarks,
> i.e.
> > closer to an actual application, to draw any real performance
conclusions]
>
> The factor twenty is (for me) the real number, as it stems from some
> algorithms I ported from ST to Java without further optimizations. The
first
> was a "windowizing" algorithm, that basically puts a large amount of small
> rectangles into a number of equally sized much bigger rectangles - the
> number of big rectangles should be minimal.
> This involves a lot of allocations, a lot of reordering and collection
> searches and iterations.
>...

To me that's a lot more interesting. Do you have figures for VW on that test
which might give us some point of comparison?

> ...
> The second was on images, and invoked many byteAtOffset: calls to access
> pixels of bitmaps.
> I got comparable results - factor 20 roughly.

Frankly, I'm not surprised about that.

>
>  Shurely there is no larger area of interpretation than on benchmarking,
and
> my numbers where not as concise as they could have been. Fortunately Chris
> made up for this :-). Also, both the Java and the ST code can definitely
be
> optimized (thereby making it much less maintainable and readable). I do
not
> intent to do that, as it is (in Java) fast enough. When having the
Designer
> hat on, I do not care about the implementation of a Rectangle class, I
just
> use it. If it is by design slower in ST, not my problem...

That is a very fair point, and one I would usually agree with, but in this
case you were attempting (I think) to make a micro comparison, so it is
pertinent.

>... This is the price
> you pay for "everything is an object". I pay the same price the opposite
way
> in Java, e.g. when creating tons of syntactical crap in form of wrapper
> classes around integers to use them as Dictionary keys. Where performance
is
> important, the current choice IMO must be a dynamic compilation VM. Noone
I
> know uses interpreted Java at all. There might be use for it e.g. when
> writing scripts that run only very short time.
>
> I will inform you of further "closer to an actual application" relations
> when I have to prototype something again.

Great. In the end those comparisons are much more interesting, even if more
difficult to achieve.

Regards

Blair


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Jochen Riekhof
> > The second was on images, and invoked many byteAtOffset: calls to access
> > pixels of bitmaps.
> > I got comparable results - factor 20 roughly.
>
> Frankly, I'm not surprised about that.

I would be very interested to know if there is a more efficient way to
access  bmp data from Dolphin.
I would like to keep all code in ST, but I have to modify the bmp pixels as
this is what has to be rendered.
Everything works great so far, except that the pixel access could be a bit
faster.
Also, it reminds me of the awkward code I had to write to check for top-down
bitmaps. Topdown bmps have a negative height in BITMAPINFOHEADER, but on
reading this structure the height is always positive regardless of bottom-up
or top-down.. This can well be a windows bug,  though.

> That is a very fair point, and one I would usually agree with, but in this
> case you were attempting (I think) to make a micro comparison, so it is
> pertinent.

No, my intent was to find out more about where the factor 20 might be
originated. From the small and immature tests I am now guessing that gc and
alloc might contribute something, and above that message sends are probably
gaining more and more importance as the code gets more complex.

> Great. In the end those comparisons are much more interesting, even if
more difficult to achieve.

And also more difficult to interpret, as the port will definitely gets
"different" in more and more places. But I will report anyway :-)

Ciao

...Jochen


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Todor Todorov-2
In reply to this post by Jochen Riekhof
Dear Jochen and group!

I've done some testing for fun and here are the results.

                 DPRO    Java     Squeek    VA      VSE     VW7NC
alloc            2160    32500    13400     15800   8100    5000
get (index)      610     180      775       140     115     100
get (iterator)   310     320      580       100     110     59
double mul       240     100      535       160     110     215

The code has been slightly altered to match the Java code as much as
possible. In other words, we create a rectangle with 0 coordinates and later
access one of it's variables (that's what Java's getWidth() method does).
I've tried to be fair so here is the code.

VW, DPRO, Squeek, VA
[ newOrigin := 0@0. newCorner := 0@0. oc := OrderedCollection new. 1 to:
1000000 do: [:each | oc add: (Rectangle origin: newOrigin corner:
newCorner)]].
[1 to: 1000000 do: [:each | (oc at: each) origin]].
[oc do: [:each | each origin]].

VSE
[ leftTopPoint := 0@0. rightBottomPoint := 0@0. oc := OrderedCollection new:
1000002. 1 to: 1000000 do: [:each | oc add:  (Rectangle leftTop:
leftTopPoint rightBottom: rightBottomPoint)]]
[1 to: 1000000 do: [:each | (oc at: each) leftTop]].
[oc do: [:each | each leftTop]].

My machine is IBM A31 Laptop running Win2K on an Intel P4-1800MHz with 512
MB ram. The Java platform is the IBM Visual Age for Java 4.0 (normal free
edition downloaded from the net). It's way too difficult to compile java
files then invoking the java.exe with the correct classpath and million
other parameters to get a small test running. IBM's VA Java resembles
Smalltalk a lot - that's why I like it. Does it run as fast as Java HotSpot?


Well, the conclusion is that Java is the slowest when it comes to allocating
objects (no surprise here). Dolphin has some strange problem the first time
it allocates many objects. If allocating 1 mil. rectangles as in the example
above, it goes smoothly (actually the fastest one!). But if I try to
allocate 3 mil. rectangles, then I have to wait minutes before it finishes.
The subsequent run of the code runs faster. So there must be something
rotten in the memory allocation algorithm when it reaches a certain size.

VW does the job well. It's out of the box installation so it's not tuned. I
guess if a VW guru tweaks the image a little the numbers will be better. VSE
is somewhere in the middle. I was surprised to see VA being the slowest ST
when it comes to object allocation. After all, it's very common to allocate
a lot of work/temp objects in Smalltalk.

When it comes to iteration, the JIT'ed Smalltalks are faster than Java.
Dolphin could probably benefit a lot in performance if they implemented JIT
VM. VW has very fast #do: for OrderedCollection!

Floating points, Java is fastest. But good old VSE is not much behind Java.
VW and Dolphin could probably do it a little better.

Well, that's all folks!

I would like to see if other people get the same results. Specially those
who can find out how to start other Java VM than IBM VAJ.


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Chris Uppal-3
In reply to this post by Chris Uppal-3
I wrote:

> I see the same effect.  I think there's something very screwy going
> on.  Either a bug or a most unfortunate interaction with the OS.

I just noticed yet a third wierdness: in a freshly started image:

    Time secondsToRun: [(1 to: 3000000) collect: [:i | Array new: 2]]
            --> 13.164

whereas in the same image (restarted again):

    Time secondsToRun: [(1 to: 3000000) collect: [:i | Point new]]
            --> 542.435

Note: times are in seconds, and I'm allocating 3M 2-slot objects, rather than
1M Rectangles.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Eliot Miranda
In reply to this post by Todor Todorov-2
Todor Todorov wrote:
[snip]
> VW does the job well. It's out of the box installation so it's not tuned. I
> guess if a VW guru tweaks the image a little the numbers will be better.

When you first drive a new, hired or borrowed car do you adjust the seat
or just leave it as you found it?  The VW memory defaults are indeed
poor defaults and we will change them in the next release, but they are
easy to change and don't require guru level sophistication to do so.  I
would appreciate a slightly higher standard in the publishing of
benchmark results.
--
_______________,,,^..^,,,____________________________
Eliot Miranda              Smalltalk - Scene not herd


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Bill Schwab-2
> Todor Todorov wrote:
> [snip]
> > VW does the job well. It's out of the box installation so it's not
tuned. I
> > guess if a VW guru tweaks the image a little the numbers will be better.
>
> When you first drive a new, hired or borrowed car do you adjust the seat
> or just leave it as you found it?  The VW memory defaults are indeed
> poor defaults and we will change them in the next release, but they are
> easy to change and don't require guru level sophistication to do so.  I
> would appreciate a slightly higher standard in the publishing of
> benchmark results.

First, the results were not _published_, they were posted on an open forum,
with a clear disclaimer indicating that your product might have been
slighted.  Instead of making condescending comments about the effort, why
don't you request the code and/or help the poster tune it?

Sincerely,

Bill

--
Wilhelm K. Schwab, Ph.D.
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Chris Uppal-3
In reply to this post by Todor Todorov-2
Todor Todorov wrote:

> IBM's VA Java resembles Smalltalk a lot - that's why I like it. Does
> it run as fast as Java HotSpot?

It is a lot like Smalltalk, in fact (I'm *told* it's a fact, though I have no
personal knowledge) it uses essentially the same VM as VASt. However, for Java,
it is nowhere near as fast as the current crop of JVMs.

> Well, the conclusion is that Java is the slowest when it comes to
> allocating objects (no surprise here).

For allocation, the test we've all been using is *wildly* unrepresentative.
Generational or quasi-generational GC is the norm for VM implementers, and has
been for years.  That implementation approach is dependent on the idea that
most objects will "die young" (that is, are eligible for GC very soon after
they are first created).  Allocating 1M long-lived Rectangles (or whatever) in
a tight loop doesn't even closely represent the typical pattern.

I had a hack at making a better micro-benchmark that tries to reflect the
pattern more closely -- it allocates far more objects than it keeps for long.
It's not a *good* benchmark, even so, but it was fun to play with.  One example
of the difference is that the IBM JVM for Windows (which is *not* the same as
the VAJ VM) switched from being 2x faster than Sun's JVM on the straight loop,
to being 2x slower on my "benchmark" (much more interestingly, it showed that
its relative performance was degraded in proportion to the amount of
non-garbage in the "image" -- which *really* suprised me).

FWIW, the Sun JVMs came out best (by a useful margin -- *if* it's real),
Dolphin and the other JVMs fared somewhat worse.  I still couldn't get VW to
perform up to its reputation (well-earned, I believe, though -- again -- I have
no personal knowledge) despite following Eliot's formula, and it trailed by an
order of magnitude.

I've decided that I don't trust the results enough to post them as numbers, not
even in fun, but if anyone's interested in the details, or in the test code,
then please feel free to drop me a line.

    -- chris

P.S.  I also looked at the memory footprint of the running processes (which
should be moderately indicative of real programs' memory reqs in response to a
given load).  Dolphin, VW, and the Sun JVMs were closely bunched, with Dolphin
first by a nose, then the other JVMs trailed in at least a factor of 2 behind.


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Eliot Miranda
In reply to this post by Bill Schwab-2
Bill Schwab wrote:

>
> > Todor Todorov wrote:
> > [snip]
> > > VW does the job well. It's out of the box installation so it's not
> tuned. I
> > > guess if a VW guru tweaks the image a little the numbers will be better.
> >
> > When you first drive a new, hired or borrowed car do you adjust the seat
> > or just leave it as you found it?  The VW memory defaults are indeed
> > poor defaults and we will change them in the next release, but they are
> > easy to change and don't require guru level sophistication to do so.  I
> > would appreciate a slightly higher standard in the publishing of
> > benchmark results.
>
> First, the results were not _published_, they were posted on an open forum,
> with a clear disclaimer indicating that your product might have been
> slighted.  Instead of making condescending comments about the effort, why
> don't you request the code and/or help the poster tune it?

I already did, yesterday.  I also posted results but they were on a much
slower machine.  I provided scaled results based on the lowest ration of
the non-allocation benchmark with the caveat that I didn't expect the
allocation benchmark to scale as well.

With usenet archives like Google's a posting is in some ways even better
than a publication because it can be retrieved so easily.  I don't think
my comments condescend.  They merely use an analogy to help make the
point.

--
_______________,,,^..^,,,____________________________
Eliot Miranda              Smalltalk - Scene not herd


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Todor Todorov-2
In reply to this post by Eliot Miranda
Eliot, I am sorry if you woke up in a bad mood this morning. When I
purchase, borrow or hire a new automobile, yes I adjust the seat. But I do
not adjust or tune the engine. Since I am not buying a race car, but just an
ordinary vehicle, I assume the factory have adjusted and tuned the engine to
a level where it satisfies most people and runs most stable. I would expect
the same of a software product.

But no hard feelings. All I wanted to say with the expression "VW guru" is
that my knowledge is not good enough to play with the VW memory parameters.
I am sorry if you or others have misunderstood my statement. I never meant
to say that a Ph.D. in VW memory management is required to tune VW.

If you tell me exactly what to do to tune the VW memory for that benchmark,
I will gladly rerun the test and post the new results.

-- Todor.

"Eliot Miranda" <[hidden email]> wrote in message
news:[hidden email]...

>
> When you first drive a new, hired or borrowed car do you adjust the seat
> or just leave it as you found it?  The VW memory defaults are indeed
> poor defaults and we will change them in the next release, but they are
> easy to change and don't require guru level sophistication to do so.  I
> would appreciate a slightly higher standard in the publishing of
> benchmark results.
> --
> _______________,,,^..^,,,____________________________
> Eliot Miranda              Smalltalk - Scene not herd


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Todor Todorov-2
In reply to this post by Eliot Miranda
I've now followed Eliot's instructions.

VW does it much better. The alloc time is down to abouc 1700 ms. So it's the
fastest in my benchmark. Eliot claims that VW is even faster in Linux, but
Bill have cast a spell on my laptop so it won't run linux.

"Eliot Miranda" <[hidden email]> wrote in message
news:[hidden email]...
>
>
> Todor Todorov wrote:
> [snip]
> > VW does the job well. It's out of the box installation so it's not
tuned. I

> > guess if a VW guru tweaks the image a little the numbers will be better.
>
> When you first drive a new, hired or borrowed car do you adjust the seat
> or just leave it as you found it?  The VW memory defaults are indeed
> poor defaults and we will change them in the next release, but they are
> easy to change and don't require guru level sophistication to do so.  I
> would appreciate a slightly higher standard in the publishing of
> benchmark results.
> --
> _______________,,,^..^,,,____________________________
> Eliot Miranda              Smalltalk - Scene not herd


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Blair McGlashan
In reply to this post by Chris Uppal-3
"Chris Uppal" <[hidden email]> wrote in message
news:3e5b71fc$1$59862$[hidden email]...
> Jochen Riekhof wrote:
>
> > time needed alloc (first time) = 36219  !!
> > time needed alloc = 1874
>
> I see the same effect.  I think there's something very screwy going on.
Either
> a bug or a most unfortunate interaction with the OS.

Actually, its a simple bug, although not in the VW as one might expect, but
in the
Smalltalk code which maintains some simple statistics to aid the decision as
to when to perform a GC at times of high allocation rates. If this bug is
patched (see attached), then I think you will find that the allocation speed
will scale pretty linearly for allocations of 1 million or 3 million
objects, and there will be relatively little difference between first and
subsequent runs.

Here are my JochenMark (:-)) results for allocation of 1 and 3 million
Rectanges with #basicNew
on D5, D6 and VWNC7, the latter being tuned as per Eliot's instructions.

                        1M        3M
D6 1st             844       2776
D6 2nd            530       1851
D5 1st             1036     3290
D5 2nd            826       2657
VWNC7         1560     7415        (no sig. dif. between 1st and 2nd runs)

The tests were run on an Athlon 1900MP+ system with 512Mb. All times are in
milliseconds.

I don't really consider these results too significant, since this sort of
test is most unlike normal application behaviour, but I'm pleased to have
tracked down an odd source of poor performance, so thanks Jochen and Chris.

Regards

Blair

------------------------
!MemoryManager methodsFor!

otOverflow: anInteger
 | now |
 now := Delay millisecondClockValue.
 now - lastGCTime > (lastGCDuration * 4)
  ifTrue:
   [lastGCTime := now.
   self primCollectGarbage: 1.
   lastGCDuration := Delay millisecondClockValue - now max: 10]! !


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Blair McGlashan
In reply to this post by Jochen Riekhof
"Jochen Riekhof" <[hidden email]> wrote in message
news:[hidden email]...
> > > The second was on images, and invoked many byteAtOffset: calls to
access
> > > pixels of bitmaps.
> > > I got comparable results - factor 20 roughly.
> >
> > Frankly, I'm not surprised about that.
>
> I would be very interested to know if there is a more efficient way to
> access  bmp data from Dolphin.

Perhaps, if you posted an example we might know a better way. However if it
basically comes down to accessing the bytes of a DIBSection directly through
its #imageBits, then you are talking about going through the a relatively
unoptimized primitive against an ExternalAddress object. This is
considerably slower than accessing the bytes of a ByteArray through the #at:
primitive, as the following example will demonstrate:

    bytes := ByteArray newFixed: 1000000.
    Time millisecondsToRun: [1 to: bytes size do: [:i | bytes at: i]].
    pBytes := bytes yourAddress asExternalAddress.
    Time millisecondsToRun: [0 to: bytes size-1 do: [:i | bytes
byteAtOffset: 0]].

The first loop runs about twice as fast as the second on my machine, even
though #at: needs to do a more expensive bounds check.

However, whatever you do you aren't going to touch the speed of direct
indexed access into a primitive array type. And even with that you aren't
going to touch the speed of dedicated graphics in manipulating your bitmaps.

>...
> Also, it reminds me of the awkward code I had to write to check for
top-down
> bitmaps. Topdown bmps have a negative height in BITMAPINFOHEADER, but on
> reading this structure the height is always positive regardless of
bottom-up
> or top-down.. This can well be a windows bug,  though.

Not sure about that one, but if anyone has any ideas we'd like to hear about
it.

>
> > That is a very fair point, and one I would usually agree with, but in
this
> > case you were attempting (I think) to make a micro comparison, so it is
> > pertinent.
>
> No, my intent was to find out more about where the factor 20 might be
> originated. From the small and immature tests I am now guessing that gc
and
> alloc might contribute something, and above that message sends are
probably
> gaining more and more importance as the code gets more complex.

Actually I doubt the allocation really has much to do with it. Please see
Chris Uppals recent postings, and my reply to him today in this thread

BTW: I've also uncovered the reason for the pause you experienced on closing
your workspace if you don't first nil out the variable, but unlike the slow
initial allocation (which requires a small change to one Smalltalk method),
the pause can only be avoided with a patched VM.

Regards

Blair


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Chris Uppal-3
In reply to this post by Blair McGlashan
Blair,

> If this bug is patched (see attached), then I think you will
> find that the allocation speed will scale pretty linearly for
> allocations of 1 million or 3 million objects, and there will be
> relatively little difference between first and subsequent runs.

Great!  Works a treat.

BTW, the situation that provokes it isn't as wildly unnatural as I'd first
thought.  I checked one of my back-burner projects that stores a largish number
of objects in STB format.  I'd sort of shelved it after discovering that I'd
need a much faster machine than I'm currently using. So I wondered how much
difference the fix made to reading in the STB data.

My toy dataset has about 0.5M objects; which is at the lower end of the number
of objects that would be affected by the bug.  Without the fix it took 60
seconds to read in (after eliminating disk IO time), with it the time dropped
to 42 seconds.

Not a *big* deal, but then I was dealing with a toy dataset; the real thing
would be 2 to 3 times larger and the bug would have had a calamitous effect.

So thank you for the fix.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Jochen Riekhof
In reply to this post by Blair McGlashan
> Perhaps, if you posted an example we might know a better way. However if
it
> basically comes down to accessing the bytes of a DIBSection directly
through
> its #imageBits, then you are talking about going through the a relatively
> unoptimized primitive against an ExternalAddress object. This is
> considerably slower than accessing the bytes of a ByteArray through the
#at:
> primitive, as the following example will demonstrate:
>...
> The first loop runs about twice as fast as the second on my machine, even
> though #at: needs to do a more expensive bounds check.

Yes, I use the ExternalAddress exposed by DIBSection>>imageBits.
 I have e.g. a pixelAt: method that reads

pixelAt: aPoint
^imageBits byteAtOffset: (rowOffsets at: aPoint y + 1) + aPoint x

When I understand you right, this is the fastest way?!
(the rowOffsets seem a bit faster than a multiplication but mainly help
a lot in dealing with bottom-up and top-down data).

> BTW: I've also uncovered the reason for the pause you experienced on
closing
> your workspace if you don't first nil out the variable, but unlike the
slow
> initial allocation (which requires a small change to one Smalltalk
method),
> the pause can only be avoided with a patched VM.

The patch is already in the image :-), the freeze on ws-close is very
artificial (as were my tests)
and never happened to me in normal work.

Ciao

...Jochen


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Bill Schwab-2
In reply to this post by Blair McGlashan
Blair,

> However, whatever you do you aren't going to touch the speed of direct
> indexed access into a primitive array type. And even with that you aren't
> going to touch the speed of dedicated graphics in manipulating your
bitmaps.

Most of the C++ programming I do is writing DLLs that do number crunching
for Dolphin.  The Smalltalk code allocates and frees memory and controls the
logic, and the C++ does the numerics.  Sometimes that's useful because there
is a lot of existing C++ code, and more often, it's for performance.  This
reminds me of a question that I've wanted to ask, but I'll start another
thread for it.


> BTW: I've also uncovered the reason for the pause you experienced on
closing
> your workspace if you don't first nil out the variable, but unlike the
slow
> initial allocation (which requires a small change to one Smalltalk
method),
> the pause can only be avoided with a patched VM.

Pause?  VM patch?  We don't like pauses :)  Is this something that could be
of general interest?

Have a good one,

Bill

--
Wilhelm K. Schwab, Ph.D.
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Eliot Miranda
In reply to this post by Blair McGlashan
> Here are my JochenMark (:-)) results for allocation of 1 and 3 million
> Rectanges with #basicNew
> on D5, D6 and VWNC7, the latter being tuned as per Eliot's instructions.
>
>                         1M        3M
> D6 1st             844       2776
> D6 2nd            530       1851
> D5 1st             1036     3290
> D5 2nd            826       2657
> VWNC7         1560     7415        (no sig. dif. between 1st and 2nd runs)

Interesting! [or alternatively "Ouch!", ed]  The VW oldSpace allocator
used to allocate tenured objects (which is what this "let's keep tons of
objects around" stresses in VW's case is poor w.r.t a classic blue-book
implementation because VW doesn't organize its oldSpace free lists as an
objectTableEntry (ote) holding onto an objectBody.  Instead it keeps
separate lists of free otes and objectBodies.  SO allocating an oldSpace
object requires unlinking an ote from one free list and an objectBody
from another.  Further, the allocation code is not at all aggressively
inlined and involves at least three procedure calls.

Blair, if you're comfortable discussing it, what oldSpace free list
organization does D5 use?


--
_______________,,,^..^,,,____________________________
Eliot Miranda              Smalltalk - Scene not herd


Reply | Threaded
Open this post in threaded view
|

Re: Evaluating Dolphin

Blair McGlashan
In reply to this post by Bill Schwab-2
"Bill Schwab" <[hidden email]> wrote in message
news:b3oce5$1njhim$[hidden email]...

>
> > BTW: I've also uncovered the reason for the pause you experienced on
> closing
> > your workspace if you don't first nil out the variable, but unlike the
> slow
> > initial allocation (which requires a small change to one Smalltalk
> method),
> > the pause can only be avoided with a patched VM.
>
> Pause?  VM patch?  We don't like pauses :)  Is this something that could
be
> of general interest?

You'd only experience a pause if a very large number of objects were
collected by a single idle time GC cycle. In this case 3M objects in one
collection. This is unlikely to occur in practice.

Regards

Blair


12345