Is it worth delaying the release?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it worth delaying the release?

Bryce Kampjes

Is it worth delaying the release until after the 3.9 upgrade and
the VMMaker upgrade? I've just started moving to a 3.9 image. There
are 24 failing tests, most are because 3.9 produces different
bytecodes than 3.8 for some things. This seems to be due to block
processing but I haven't fully investigated.

There are a few other failures including one due to a bug in 3.9.

Is it more useful for me to release as is. A solid release on 3.8
before moving to 3.9 or to upgrade everything now. I'm going to
upgrade next anyway. I'm not going to maintain different tests for
both releases so once I fix them for 3.9 they will fail in 3.8.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Is it worth delaying the release?

Colin Putney

On Nov 12, 2006, at 12:28 PM, <[hidden email]>  
<[hidden email]> wrote:

> Is it more useful for me to release as is. A solid release on 3.8
> before moving to 3.9 or to upgrade everything now. I'm going to
> upgrade next anyway. I'm not going to maintain different tests for
> both releases so once I fix them for 3.9 they will fail in 3.8.

Better to make a solid release for 3.8. That has value regardless of  
the existence of 3.9.

Colin

_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Is it worth delaying the release?

Tony Garnock-Jones-2
In reply to this post by Bryce Kampjes
[hidden email] wrote:
> I'm not going to maintain different tests for
> both releases so once I fix them for 3.9 they will fail in 3.8.

That's interesting! 3.9 didn't need a VM upgrade, did it? What's
different about Exupery?

Tony

_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Is it worth delaying the release?

Andrew Tween
In reply to this post by Bryce Kampjes
Hi Bryce,
I think it is a good idea to release the solid 3.8 version.

Having said that, I am looking forward to the 3.9 release because I really want
to try using Exupery on my sub-pixel font filtering algorithm to see if it can
speed it up. Currently this is in 3.9, and I don't want to port it all back to
an earlier image/vm, especially since you are moving forward to 3.9.

This is probably a topic for another thread, but could you tell from looking at
the attached method if it is a good candidate for speed-up. It has nested loops,
does lots of at: and integerAt:Put: (prim 166) , and SmallInteger bitShift: ,
bitAnd: , *, + , // , and some Float calcs.

Cheers,
Andy

<[hidden email]> wrote in message
news:[hidden email]...

>
> Is it worth delaying the release until after the 3.9 upgrade and
> the VMMaker upgrade? I've just started moving to a 3.9 image. There
> are 24 failing tests, most are because 3.9 produces different
> bytecodes than 3.8 for some things. This seems to be due to block
> processing but I haven't fully investigated.
>
> There are a few other failures including one due to a bug in 3.9.
>
> Is it more useful for me to release as is. A solid release on 3.8
> before moving to 3.9 or to upgrade everything now. I'm going to
> upgrade next anyway. I'm not going to maintain different tests for
> both releases so once I fix them for 3.9 they will fail in 3.8.
>
> Bryce


begin 666 GlyphForm-asBalancedGlyphFormWithDepth32ItalicFIR.st
M)T9R;VT@4W%U96%K,RXY9V%M;6$@;V8@)R<R,R!*=6QY(#(P,#8G)R!;;&%T
M97-T('5P9&%T93H@(S<P-C9=(&]N(#$S($YO=F5M8F5R(#(P,#8@870@-SHS
M.#HQ,"!P;2<A#0TA1VQY<&A&;W)M(&UE=&AO9'-&;W(Z("=C;VYV97)T:6YG
M)R!S=&%M<#H@)W1W965N(#@O,3(O,C P-B Q.3HP-"<A#6%S0F%L86YC961'
M;'EP:$9O<FU7:71H1&5P=&@S,DET86QI8T9)4CH@84)O;VQE86X-"2)R96-E
M:79E<B!S:&]U;&0@8F4@,W@@<W1R971C:&5D(#@@8FET($=L>7!H1F]R;2(-
M"7P@=R!H(',@86YS=V5R(')O=W-T87)T(&)Y=&5S('=O<F0@;&ET=&QE16YD
M:6%N('-H:69T('8@82!C;VQO<E9A;"!I( T)("!P<F5V1R!P<F5V0B!R(&<@
M8B!N97AT4B!N97AT1R @9FEL=&5R<R!R9FEL=&5R(&=F:6QT97(@8F9I;'1E
M<B!C;W)R96-T:6]N1F%C=&]R( T)8F%L4B!B86Q'(&)A;$(@;'5M:6YA;F-E
M('P-"0T)8V]R<F5C=&EO;D9A8W1O<B Z/2!A0F]O;&5A;@T)"6EF5')U93I;
M1G)E951Y<&5&;VYT('-U8E!I>&5L0V]L;W)#;W)R96-T:6]N1F%C=&]R271A
M;&EC70T)"6EF1F%L<V4Z6T9R9654>7!E1F]N="!S=6)0:7AE;$-O;&]R0V]R
M<F5C=&EO;D9A8W1O<ETN#0EF:6QT97)S(#H]($9R9654>7!E1F]N="!S=6)0
M:7AE;$9I;'1E<G,N#0ER9FEL=&5R(#H](&9I;'1E<G,@870Z(#$N#0EG9FEL
M=&5R(#H](&9I;'1E<G,@870Z(#(N#0EB9FEL=&5R(#H](&9I;'1E<G,@870Z
M(#,N#0EB>71E<R Z/2!S96QF(&)I=',N#0EW(#H]('-E;&8@=VED=&@N#0EH
M(#H]('-E;&8@:&5I9VAT+@T)86YS=V5R(#H]('-E;&8@8VQA<W,@97AT96YT
M.B H*'-E;&8@=VED=&@@+R S*2!C96EL:6YG("L@,BE :"!D97!T:#H@,S(N
M#0EA;G-W97(@#0D);V9F<V5T.B H;V9F<V5T('@@+R S*2!R;W5N9&5D0&]F
M9G-E="!Y.PT)"6%D=F%N8V4Z("AA9'9A;F-E("\@,RD@<F]U;F1E9#L-"0EL
M:6YE87)!9'9A;F-E.B!L:6YE87)!9'9A;F-E+@D-"7,@.CT@=R K(#,@/CX@
M,BX-"6QI='1L945N9&EA;B Z/2!S96QF(&ES3&ET=&QE16YD:6%N+@T),"!T
M;SH@:" M(#$@9&\Z(%LZ>2!\#0D)<F]W<W1A<G0@.CT@*'D@*B!S*2LQ+@T)
M"7!R979'(#H]('!R979"(#H],"X-"0DP('1O.B!W("T@,2!B>3H@,R!D;SI;
M.G@@?" -"0D),"!T;SH@,B!D;SI;.G-U8G!I>&5L:6YD97@@? T)"0D):2 Z
M/2!X("L@<W5B<&EX96QI;F1E>"X-"0D)"7=O<F0@.CT@8GET97,@870Z(')O
M=W-T87)T("L@*&DO+S0I+@T)"0D)<VAI9G0@.CT@+3@J("AL:71T;&5%;F1I
M86X@#0D)"0D):694<G5E.EMI(&)I=$%N9#H@,UT@#0D)"0D):69&86QS93I;
M,RTH:2!B:71!;F0Z(#,I72DN#0D)"0EV(#H]("AW;W)D(&)I=%-H:69T.B!S
M:&EF="[hidden email] Q-G)&1BX-"0D)"7-U8G!I>&5L:6YD97@@/2 P(&EF
M5')U93I;<B Z/2!V72X-"0D)"7-U8G!I>&5L:6YD97@@/2 Q(&EF5')U93I;
M9R Z/2!V72X-"0D)"7-U8G!I>&5L:6YD97@@/2 R(&EF5')U93I;8B Z/2!V
M75TN#0D)"7@@/CT@*'<M,RD-"0D)"6EF5')U93I;;F5X=%(@.CT@;F5X=$<@
M.CT@,%T-"0D)"6EF1F%L<V4Z6PT)"0D)"3 @=&\Z(#$@9&\Z6SIS=6)P:7AE
M;&EN9&5X('P-"0D)"0D):2 Z/2!X("L@,R K('-U8G!I>&5L:6YD97@N#0D)
M"0D)"7=O<F0@.CT@8GET97,@870Z(')O=W-T87)T("L@*&DO+S0I+@T)"0D)
M"0ES:&EF=" Z/2 M."H@*&QI='1L945N9&EA;B -"0D)"0D)"6EF5')U93I;
M:2!B:71!;F0Z(#-=( T)"0D)"0D):69&86QS93I;,RTH:2!B:71!;F0Z(#,I
M72DN#0D)"0D)"78@.CT@*'=O<F0@8FET4VAI9G0Z('-H:69T*2!B:71!;F0Z
M(#$V<D9&+@T)"0D)"0ES=6)P:7AE;&EN9&5X(#T@,"!I9E1R=64Z6VYE>'12
M(#H]('9=+@T)"0D)"0ES=6)P:7AE;&EN9&5X(#T@,2!I9E1R=64Z6VYE>'1'
M(#H]('9=75TN#0D)"2)B86QA;F-E('(@9R!B(@D-"0D)8F%L4B Z/2 H<')E
M=D<J*')F:6QT97(@870Z(#$I*2L-"0D)"2AP<F5V0BHH<F9I;'1E<B!A=#H@
M,BDI*PT)"0D)*'(J*')F:6QT97(@870Z(#,I*2L-"0D)"2AG*BAR9FEL=&5R
M(&%T.B T*2DK#0D)"0DH8BHH<F9I;'1E<B!A=#H@-2DI+@T)"0EB86Q'(#H]
M("AP<F5V0BHH9V9I;'1E<B!A=#H@,2DI*PT)"0D)*'(J*&=F:6QT97(@870Z
M(#(I*2L-"0D)"2AG*BAG9FEL=&5R(&%T.B S*2DK#0D)"0DH8BHH9V9I;'1E
M<B!A=#H@-"DI*PT)"0D)*&YE>'12*BAG9FEL=&5R(&%T.B U*2DN#0D)"6)A
M;$(@.CT@*'(J*&)F:6QT97(@870Z(#$I*2L-"0D)"2AG*BAB9FEL=&5R(&%T
M.B R*2DK#0D)"0DH8BHH8F9I;'1E<B!A=#H@,RDI*PT)"0D)*&YE>'12*BAB
M9FEL=&5R(&%T.B T*2DK#0D)"0DH;F5X=$<J*&)F:6QT97(@870Z(#4I*2X-
M"0D);'5M:6YA;F-E(#H]("@P+C(Y.2IB86Q2*2LH,"XU.#<J8F%L1RDK*# N
M,3$T*F)A;$(I+@T)"0EB86Q2(#H](&)A;%(@*R H*&QU;6EN86YC92 M(&)A
M;%(I*F-O<G)E8W1I;VY&86-T;W(I+@T)"0EB86Q'(#H](&)A;$<@*R H*&QU
M;6EN86YC92 M(&)A;$<I*F-O<G)E8W1I;VY&86-T;W(I+@T)"0EB86Q"(#H]
M(&)A;$(@*R H*&QU;6EN86YC92 M(&)A;$(I*F-O<G)E8W1I;VY&86-T;W(I
M+@T)"0EB86Q2(#H](&)A;%(@('1R=6YC871E9"X-"0D)8F%L4B \(# @:694
M<G5E.EMB86Q2(#H](#!=(&EF1F%L<V4Z6V)A;%(@/B R-34@:694<G5E.EMB
M86Q2(#H](#(U-5U=+@D-"0D)8F%L1R Z/2!B86Q'("!T<G5N8V%T960N#0D)
M"6)A;$<@/" P(&EF5')U93I;8F%L1R Z/2 P72!I9D9A;'-E.EMB86Q'(#X@
M,C4U(&EF5')U93I;8F%L1R Z/2 R-35=72X)"0T)"0EB86Q"(#H](&)A;$(@
M('1R=6YC871E9"X-"0D)8F%L0B \(# @:694<G5E.EMB86Q"(#H](#!=(&EF
M1F%L<V4Z6V)A;$(@/B R-34@:694<G5E.EMB86Q"(#H](#(U-5U=+@D@#0D)
M"6$@.CT@8F%L4B K(&)A;$<@*R!B86Q"(#X@,"!I9E1R=64Z6S$V<D9&72!I
M9D9A;'-E.ELP72X-"0D)8V]L;W)686P@.CT@8F%L0B K("AB86Q'(&)I=%-H
M:69T.B X*2 K(" H8F%L4B!B:713:&EF=#H@,38I("L@*&$@8FET4VAI9G0Z
M(#(T*2X-"0D)86YS=V5R(&)I=',@:6YT96=E<D%T.B H>2IA;G-W97(@=VED
M=&@I*RAX+R\S*S$I('!U=#H@8V]L;W)686PN#0D)"7!R979"(#H](&(N('!R
M979'(#H](&<N(" B<F5M96UB97(@=&AE('5N8F%L86YC960@=F%L=65S(B!=
/72X-"5YA;G-W97(A("$-
`
end


_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Is it worth delaying the release?

Bryce Kampjes

Hi Andy, Any chance you could build a Win 32 version of the VM for the
release?  VM's for other platforms would also be nice too. It would be
really great to release on two platforms at once.

The versions to use are:
  Exupery-wbk.219
  VMMAker-wbk.42

The big decision was really between releasing on VMMaker 3.8b3 based
VMs or upgrading to VMMaker 3.8b6. We're hoping that upgrading will
solve the problems that the Mac x86 port is having. Hopefully a Mac
port should appear during 0.11 development. Upgrading VMMaker risks
destabilizing this release, and also makes it harder for the ports
that exist to build VMs to go with the release.

I've now got a working 3.9 development image based on the squeak-dev
images. I'll include that image along with the release. The exupery39
versions are the port to 3.9. The only problems have been with tests.
23 tests were failing because the bytecodes are all 8 bytes further
down in the MethodContexts in 3.9. One test was failing due to a bug
fix in 3.9. Exupery currently works in both 3.8 and 3.9 images but not
all the tests will pass in both images.

I've moved all the VM code into the VMMaker package, this is to make
it easier to see when a new VM may be needed. If the VMMaker package
hasn't changed then none of the VM code will have changed. If it
hasn't then all the changes were in image side code. Previously there
is no easy way to see if a new VM is required between different
versions of Exupery

The release will be built on 3.8, with the old well tested VMMaker
3.8b3 VMs and will include a 3.9 developer image with the tests fixed.
So the release image will be slightly ahead of the release but only
tests will have changed.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Is it worth delaying the release?

Andrew Tween

<[hidden email]> wrote in message
news:[hidden email]...
>
> Hi Andy, Any chance you could build a Win 32 version of the VM for the
> release?  VM's for other platforms would also be nice too. It would be
> really great to release on two platforms at once.
>
> The versions to use are:
>   Exupery-wbk.219
>   VMMAker-wbk.42
>

No problem.
Any particular version of the SVN vm sources?
Are you building the vm from a 3.8 basic image, 3.8 full, or 3.9? It may not
make any difference, but I would like to build from the exact-same setup that
you use.

> The big decision was really between releasing on VMMaker 3.8b3 based
> VMs or upgrading to VMMaker 3.8b6. We're hoping that upgrading will
> solve the problems that the Mac x86 port is having. Hopefully a Mac
> port should appear during 0.11 development. Upgrading VMMaker risks
> destabilizing this release, and also makes it harder for the ports
> that exist to build VMs to go with the release.
>
> I've now got a working 3.9 development image based on the squeak-dev
> images. I'll include that image along with the release. The exupery39
> versions are the port to 3.9. The only problems have been with tests.
> 23 tests were failing because the bytecodes are all 8 bytes further
> down in the MethodContexts in 3.9. One test was failing due to a bug
> fix in 3.9. Exupery currently works in both 3.8 and 3.9 images but not
> all the tests will pass in both images.
>
> I've moved all the VM code into the VMMaker package, this is to make
> it easier to see when a new VM may be needed. If the VMMaker package
> hasn't changed then none of the VM code will have changed. If it
> hasn't then all the changes were in image side code. Previously there
> is no easy way to see if a new VM is required between different
> versions of Exupery
>
> The release will be built on 3.8, with the old well tested VMMaker
> 3.8b3 VMs and will include a 3.9 developer image with the tests fixed.
> So the release image will be slightly ahead of the release but only
> tests will have changed.
>
> Bryce



_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Exupery for sub-pixel font filtering.

Bryce Kampjes
In reply to this post by Andrew Tween

Hello again,
This time about sub-pixel aliasing.

Andrew Tween writes:
 > Hi Bryce,
 > I think it is a good idea to release the solid 3.8 version.
 >
 > Having said that, I am looking forward to the 3.9 release because I really want
 > to try using Exupery on my sub-pixel font filtering algorithm to see if it can
 > speed it up. Currently this is in 3.9, and I don't want to port it all back to
 > an earlier image/vm, especially since you are moving forward to 3.9.

Exupery runs fine on 3.9, the tests just needed to be fixed.

The best way to find out how it performs for your example would be to
load Exupery into your 3.9 image and try it.

 > This is probably a topic for another thread, but could you tell from looking at
 > the attached method if it is a good candidate for speed-up. It has nested loops,
 > does lots of at: and integerAt:Put: (prim 166) , and SmallInteger bitShift: ,
 > bitAnd: , *, + , // , and some Float calcs.

I'm not sure how well it would run. The code is definately a promising
candidate to compile however Exupery doesn't yet compile Floats, large
integers, or primitive 166. I don't think the interpreter does any
special optimisations for them either so chances are those operations
will run at the same speed. Exupery will be able to optimise the
SmallInteger calculations and looping overhead.

The method could definately be optimised much more. Adding
integerAt:put: and ByteArray>>at: primitives would help. So would
basic floating point optimisations. Going further, adding support for
machine word (32 bit integer) and byte objects should allow us to
compile to near C speeds.

The optimisations for machine words, bytes objects, and floating point
are all very similar. The game is to remove all the intermediate
objects so the calculations are done directly in registers without any
conversion and deconversion overhead.

  luminance := (0.299*balR)+(0.587*balG)+(0.114*balB).
  balR := balR + ((luminance - balR)*correctionFactor).
  balG := balG + ((luminance - balG)*correctionFactor).
  balB := balB + ((luminance - balB)*correctionFactor).
  balR := balR  truncated.
  balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := 255]].
  balG := balG  truncated.
  balG < 0 ifTrue:[balG := 0] ifFalse:[balG > 255 ifTrue:[balG := 255]].
  balB := balB  truncated.
  balB < 0 ifTrue:[balB := 0] ifFalse:[balB > 255 ifTrue:[balB := 255]].
  a := balR + balG + balB > 0 ifTrue:[16rFF] ifFalse:[0].
  colorVal := balB + (balG bitShift: 8) +  (balR bitShift: 16) + (a bitShift: 24).
  answer bits integerAt: (y*answer width)+(x//3+1) put: colorVal.

Is a nice example to show what dynamically inlined primitives could
do. The major overhead with floats is allocating memory (1). In this
example, using the current optimisation engine it should be possible
to create only 4 floats rather than 19 needed by the intepreter. One
more allocation will be needed to form colorVal if it overflows into a
LargeInteger. SSA should allow all the floating point intermediate
values to be removed by allow program analysis over more than one
statement.

  balR := balR  truncated.
  balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR :=
255]].

Should probably be handled via a primitive that truncates a floating
point value down to an unsigned 8 bit value. For this example such a
primitive may be overkill however converting floating point values
to. But with Exupery 3.0 and SSA it would be really nice to be able to
optimise to vectors. With vector optimisation we will have a level
playing field with C, they will need at least as much compiler
machinery as we will and they will probably write their compilers in C
requiring much more work than writing in Smalltalk.

In summary, I think there may be some speed improvement now. Adding
the array access primitives will help. Floating point is likely to be
the next biggest win. Without SSA I doubt that other optimisations
will provide enough gain to be worthwhile. With SSA and a few extra
object types it should be possible to fully optimise it.

Bryce

(1) After upgrading the VM I'm going to implement fast compiled
primitives for #new and #@. This is driven by the largeExplorers
benchmark. #@ is inlined into the main interpret loop in the
interpreter but Exupery executes it as a normal primitive. This means
that compiling largeExplorers can lead to a 8% speed loss.
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Is it worth delaying the release?

Bryce Kampjes
In reply to this post by Andrew Tween
Andrew Tween writes:
 >
 > <[hidden email]> wrote in message
 > news:[hidden email]...
 > >
 > > Hi Andy, Any chance you could build a Win 32 version of the VM for the
 > > release?  VM's for other platforms would also be nice too. It would be
 > > really great to release on two platforms at once.
 > >
 > > The versions to use are:
 > >   Exupery-wbk.219
 > >   VMMAker-wbk.42
 > >
 >
 > No problem.
 > Any particular version of the SVN vm sources?
 > Are you building the vm from a 3.8 basic image, 3.8 full, or 3.9? It may not
 > make any difference, but I would like to build from the exact-same setup that
 > you use.

Thanks,

I built the VM in a 3.8 image using my normal build environment.

So probably:

  svn export http://squeakvm.org/svn/squeak/tags/unix-3.7-7

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Exupery for sub-pixel font filtering.

Andrew Tween
In reply to this post by Bryce Kampjes

<[hidden email]> wrote in message
news:[hidden email]...
>
> Hello again,
> This time about sub-pixel aliasing.
>
> Andrew Tween writes:
>  > Hi Bryce,
>  > I think it is a good idea to release the solid 3.8 version.
>  >
>  > Having said that, I am looking forward to the 3.9 release because I really
want
>  > to try using Exupery on my sub-pixel font filtering algorithm to see if it
can
>  > speed it up. Currently this is in 3.9, and I don't want to port it all back
to
>  > an earlier image/vm, especially since you are moving forward to 3.9.
>
> Exupery runs fine on 3.9, the tests just needed to be fixed.
>
> The best way to find out how it performs for your example would be to
> load Exupery into your 3.9 image and try it.

The subpixel rendering needs a modified vm (for BitBlt stuff).
And Exupery needs a modified vm.
Currently these are built from different versions of vmmaker, svn sources,etc.
So, I am keen for them to be synchronised, and I am sure it will all come
together eventually.

In the meantime, I guess I could create a standalone benchmark, which would be
interesting in its own right.

>
>  > This is probably a topic for another thread, but could you tell from
looking at
>  > the attached method if it is a good candidate for speed-up. It has nested
loops,
>  > does lots of at: and integerAt:Put: (prim 166) , and SmallInteger bitShift:
,
>  > bitAnd: , *, + , // , and some Float calcs.
>
> I'm not sure how well it would run. The code is definately a promising
> candidate to compile however Exupery doesn't yet compile Floats, large
> integers, or primitive 166. I don't think the interpreter does any
> special optimisations for them either so chances are those operations
> will run at the same speed. Exupery will be able to optimise the
> SmallInteger calculations and looping overhead.

Is the primitive compilation something that I, or others, could help with? What
is involved in adding a primitive to Exupery?

>
> The method could definately be optimised much more. Adding
> integerAt:put: and ByteArray>>at: primitives would help. So would
> basic floating point optimisations. Going further, adding support for
> machine word (32 bit integer) and byte objects should allow us to
> compile to near C speeds.
>
> The optimisations for machine words, bytes objects, and floating point
> are all very similar. The game is to remove all the intermediate
> objects so the calculations are done directly in registers without any
> conversion and deconversion overhead.
>
>   luminance := (0.299*balR)+(0.587*balG)+(0.114*balB).
>   balR := balR + ((luminance - balR)*correctionFactor).
>   balG := balG + ((luminance - balG)*correctionFactor).
>   balB := balB + ((luminance - balB)*correctionFactor).
>   balR := balR  truncated.
>   balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR := 255]].
>   balG := balG  truncated.
>   balG < 0 ifTrue:[balG := 0] ifFalse:[balG > 255 ifTrue:[balG := 255]].
>   balB := balB  truncated.
>   balB < 0 ifTrue:[balB := 0] ifFalse:[balB > 255 ifTrue:[balB := 255]].
>   a := balR + balG + balB > 0 ifTrue:[16rFF] ifFalse:[0].
>   colorVal := balB + (balG bitShift: 8) +  (balR bitShift: 16) + (a bitShift:
24).

>   answer bits integerAt: (y*answer width)+(x//3+1) put: colorVal.
>
> Is a nice example to show what dynamically inlined primitives could
> do. The major overhead with floats is allocating memory (1). In this
> example, using the current optimisation engine it should be possible
> to create only 4 floats rather than 19 needed by the intepreter. One
> more allocation will be needed to form colorVal if it overflows into a
> LargeInteger. SSA should allow all the floating point intermediate
> values to be removed by allow program analysis over more than one
> statement.
>
>   balR := balR  truncated.
>   balR < 0 ifTrue:[balR := 0] ifFalse:[balR > 255 ifTrue:[balR :=
> 255]].
>
> Should probably be handled via a primitive that truncates a floating
> point value down to an unsigned 8 bit value. For this example such a
> primitive may be overkill however converting floating point values
> to. But with Exupery 3.0 and SSA it would be really nice to be able to
> optimise to vectors. With vector optimisation we will have a level
> playing field with C, they will need at least as much compiler
> machinery as we will and they will probably write their compilers in C
> requiring much more work than writing in Smalltalk.
>
> In summary, I think there may be some speed improvement now. Adding
> the array access primitives will help. Floating point is likely to be
> the next biggest win. Without SSA I doubt that other optimisations
> will provide enough gain to be worthwhile. With SSA and a few extra
> object types it should be possible to fully optimise it.

Thanks for your comments. I had intended to re-write the method in C and add it
to the plugin, but the advantages of being able to easily play with it in
Smalltalk outweigh the speed-up of porting to C, at least while I am still
experimenting.

Cheers,
Andy

>
> Bryce
>
> (1) After upgrading the VM I'm going to implement fast compiled
> primitives for #new and #@. This is driven by the largeExplorers
> benchmark. #@ is inlined into the main interpret loop in the
> interpreter but Exupery executes it as a normal primitive. This means
> that compiling largeExplorers can lead to a 8% speed loss.



_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Exupery for sub-pixel font filtering.

Bryce Kampjes
Andrew Tween writes:
 > Is the primitive compilation something that I, or others, could help with? What
 > is involved in adding a primitive to Exupery?
 >

Primitives vary. Simple primitives may only be two lines with two
different tests covering them. Exupery is tested both by compiling
methods then testing they work correctly and also by unit tests that
test the individual components.

   primitiveLoadInstVar: aMedPrimitive
       ^emitter
           fetchAddress: (MedLiteral literal: aMedPrimitive primitiveNumber - 264)
           ofObject: (aMedPrimitive arguments first visitWith: self)

Is the simplest primitive in Exupery at the moment. It's the quick
return primitive to return an instance variable. This is a primitive
to save the cost of creating a context.

The primitives are in the category primitive in
IntermediateSimplifier. The end to end tests are in the category "Test
- Primitives" in ExuperyStoryTests. The unit tests are in "Tests -
Primitives" in IntermediateSimplifierTests.

#at: and #at:put: primitives will be much easier to write than
floating point primitives. #new involves calling C and also saving
state around a potential GC call.

Decently optimised floating point primitives will require some
cleaning up of the front end to generalise the inlining of integer
code. Ideally, it should be possible to guess the type of an operation
then to forget the guess if type feedback shows it was wrong. This
would generalize the inlining done manually for arithmetic primitives
with the dynamic primitive inlining done for #at: etc.

 > Thanks for your comments. I had intended to re-write the method in C and add it
 > to the plugin, but the advantages of being able to easily play with it in
 > Smalltalk outweigh the speed-up of porting to C, at least while I am still
 > experimenting.

Writing a single primitive in C would be less work than optimizing
Exupery to handle everything needed for near C speeds.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Is it worth delaying the release?

Andrew Tween
In reply to this post by Bryce Kampjes
Hi Bryce,
I have built a Win32 vm.
Firstly, I tried building from a Squeak3.8.1-6747-full image, but that gave MNU
errors when generating :(.
So then I tried again with Squeak3.8-6665-full, and this time it generated, and
compiled ok.

4 tests are failing in the ExuperyStoryTests...
      #testBlockBug3
      #testBlockNonLocalReturnsRecycleContexts
      #testBlocksAndProcessesBug
      #testDelayWaitStressTest

The benchmarks are...
    arithmaticLoopBenchmark 2487 compiled 285 ratio: 8.726
    bytecodeBenchmark 4271 compiled 1255 ratio: 3.403
    sendBenchmark 3482 compiled 1772 ratio: 1.965
    doLoopsBenchmark 2078 compiled 1663 ratio: 1.250
    largeExplorers 2224 compiled 1683 ratio: 1.321
    compilerBenchmark 2093 compiled 1712 ratio: 1.223
    Cumulative Time 12903.774 compiled 4971.489 ratio 2.596

Let me know if the above indicates a fully-functioning VM and I'll let you have
it.
Cheers,
Andy

<[hidden email]> wrote in message
news:[hidden email]...

> Andrew Tween writes:
>  >
>  > <[hidden email]> wrote in message
>  > news:[hidden email]...
>  > >
>  > > Hi Andy, Any chance you could build a Win 32 version of the VM for the
>  > > release?  VM's for other platforms would also be nice too. It would be
>  > > really great to release on two platforms at once.
>  > >
>  > > The versions to use are:
>  > >   Exupery-wbk.219
>  > >   VMMAker-wbk.42
>  > >
>  >
>  > No problem.
>  > Any particular version of the SVN vm sources?
>  > Are you building the vm from a 3.8 basic image, 3.8 full, or 3.9? It may
not
>  > make any difference, but I would like to build from the exact-same setup
that

>  > you use.
>
> Thanks,
>
> I built the VM in a 3.8 image using my normal build environment.
>
> So probably:
>
>   svn export http://squeakvm.org/svn/squeak/tags/unix-3.7-7
>
> Bryce



_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Is it worth delaying the release?

Bryce Kampjes

Hi Andrew,
The VM looks fine to me. More detail below.

Andrew Tween writes:
 > Hi Bryce,
 > I have built a Win32 vm.
 > Firstly, I tried building from a Squeak3.8.1-6747-full image, but that gave MNU
 > errors when generating :(.
 > So then I tried again with Squeak3.8-6665-full, and this time it generated, and
 > compiled ok.
 >
 > 4 tests are failing in the ExuperyStoryTests...
 >       #testBlockBug3

Relies on a test from the refactoring browser.

 >       #testBlockNonLocalReturnsRecycleContexts

Another refactoring browser test is used here.

 >       #testBlocksAndProcessesBug

This one uses CommandShell which is built on top of OSProcess.

 >       #testDelayWaitStressTest

This test uses GraphViz which is used to lay out graphical inspectors
for intermediate code. This is also why I've loaded OSProcess and
CommandShell into my standard image.

I need to figure out a decent way of handling dependencies on other
packages. Exupery itself should be dependency free, but the tests are
not. I re-use tests from other packages if they catch a crash in
Exupery. This can wait though.

 > The benchmarks are...
 >     arithmaticLoopBenchmark 2487 compiled 285 ratio: 8.726
 >     bytecodeBenchmark 4271 compiled 1255 ratio: 3.403
 >     sendBenchmark 3482 compiled 1772 ratio: 1.965
 >     doLoopsBenchmark 2078 compiled 1663 ratio: 1.250
 >     largeExplorers 2224 compiled 1683 ratio: 1.321
 >     compilerBenchmark 2093 compiled 1712 ratio: 1.223
 >     Cumulative Time 12903.774 compiled 4971.489 ratio 2.596

The numbers look very good to me. The micro benchmarks are worse
than I get here and the macro benchmarks are much better.

Here's the benchmarks I get:
   arithmaticLoopBenchmark 1397 compiled 92 ratio: 15.185
   bytecodeBenchmark 2135 compiled 463 ratio: 4.611
   sendBenchmark 1576 compiled 699 ratio: 2.255
   doLoopsBenchmark 1083 compiled 841 ratio: 1.288
   largeExplorers 356 compiled 366 ratio: 0.973
   compilerBenchmark 733 compiled 708 ratio: 1.035
   Cumulative Time 4213.729 compiled 1453.554 ratio 2.899

I'm running an Athlon 64 3500+ 2.2GHz.  What CPU did you use
for those benchmarks?

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Is it worth delaying the release?

Andrew Tween

<[hidden email]> wrote in message
news:[hidden email]...
>
> Hi Andrew,
> The VM looks fine to me. More detail below.

Good. I'll email it to you.

>
> Andrew Tween writes:
>  > Hi Bryce,
>  > I have built a Win32 vm.
>  > Firstly, I tried building from a Squeak3.8.1-6747-full image, but that gave
MNU
>  > errors when generating :(.
>  > So then I tried again with Squeak3.8-6665-full, and this time it generated,
and

>  > compiled ok.
>  >
>  > 4 tests are failing in the ExuperyStoryTests...
>  >       #testBlockBug3
>
> Relies on a test from the refactoring browser.
>
>  >       #testBlockNonLocalReturnsRecycleContexts
>
> Another refactoring browser test is used here.
>
>  >       #testBlocksAndProcessesBug
>
> This one uses CommandShell which is built on top of OSProcess.
>
>  >       #testDelayWaitStressTest
>
> This test uses GraphViz which is used to lay out graphical inspectors
> for intermediate code. This is also why I've loaded OSProcess and
> CommandShell into my standard image.
>
> I need to figure out a decent way of handling dependencies on other
> packages. Exupery itself should be dependency free, but the tests are
> not. I re-use tests from other packages if they catch a crash in
> Exupery. This can wait though.
>
>  > The benchmarks are...
>  >     arithmaticLoopBenchmark 2487 compiled 285 ratio: 8.726
>  >     bytecodeBenchmark 4271 compiled 1255 ratio: 3.403
>  >     sendBenchmark 3482 compiled 1772 ratio: 1.965
>  >     doLoopsBenchmark 2078 compiled 1663 ratio: 1.250
>  >     largeExplorers 2224 compiled 1683 ratio: 1.321
>  >     compilerBenchmark 2093 compiled 1712 ratio: 1.223
>  >     Cumulative Time 12903.774 compiled 4971.489 ratio 2.596
>
> The numbers look very good to me. The micro benchmarks are worse
> than I get here and the macro benchmarks are much better.
>
> Here's the benchmarks I get:
>    arithmaticLoopBenchmark 1397 compiled 92 ratio: 15.185
>    bytecodeBenchmark 2135 compiled 463 ratio: 4.611
>    sendBenchmark 1576 compiled 699 ratio: 2.255
>    doLoopsBenchmark 1083 compiled 841 ratio: 1.288
>    largeExplorers 356 compiled 366 ratio: 0.973
>    compilerBenchmark 733 compiled 708 ratio: 1.035
>    Cumulative Time 4213.729 compiled 1453.554 ratio 2.899
>
> I'm running an Athlon 64 3500+ 2.2GHz.  What CPU did you use
> for those benchmarks?

Pentium 3 Mobile. 1133MHz.
I need a new PC - then I'd get twice as much work done ;)
Cheers,
Andy

>
> Bryce



_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery
Reply | Threaded
Open this post in threaded view
|

Re: Is it worth delaying the release?

Bryce Kampjes
Andrew Tween writes:
 >
 > <[hidden email]> wrote in message
 > news:[hidden email]...
 > >
 > > Hi Andrew,
 > > The VM looks fine to me. More detail below.
 >
 > Good. I'll email it to you.

Thanks, I've got the email, I'll upload it tomorrow night.

Bryce
_______________________________________________
Exupery mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery