On Jul 23, 2007, at 4:03 PM, Ron Teitelbaum wrote: > Hey Derek, > > Ok I admit I'm only just scanning these emails but I have to ask the > question: Why in the world would you need 250 fps? I don't think that Derek introduced the 250 fps figure in this thread. > When I was 6 years old I > worked at an animation camera company across from Hannah Barbara. > At the > time the cameramen were up in arms because standard cartoons were > going from > 12 to 7 fps. The cameramen were arguing that 7 fps was too slow > and would > show a flicker (the cartoons looked fine to me, I think they were > upset > about the money they were loosing doing the shooting). From my > time working > in the movie business later I know that most films are like 25 > fps. When > you see someone running on a movie screen they don't look like they > are > jumping from place to place. That's a good point, although it depends on what is being displayed. Most of the time 25 fps is OK, but if the camera pans rapidly (which it doesn't do often, probably for this reason) it is quite visible. IMAX films run at 24 fps, and fly-over sequences appear very jerky to me. In interactive 3d environments, there is probably more rapid panning than in feature films, so a higher frame rate is desirable. Also, I think that our standards have gone up. I used to think that the "hi-res" images on our Apple IIGS looked real at 640x200; now it would take a 30-inch, 200 dpi monitor to impress me like that. > There is no way the human eye could discern > 250 fps. Nor could the monitor display it. > > A frame rate of 12 is really good, 25 is excellent, 250 is nuts. For me, a frame rate of 12 is fairly annoying, 25 is pretty good, 50 is excellent, and 250 is... nuts. Josh > > Am I missing something? > > Ron > >> -----Original Message----- >> From: David P. Reed [mailto:[hidden email]] >> Sent: Monday, July 23, 2007 2:03 PM >> To: [hidden email]; Derek Arndt >> Subject: Re: [croquet-dev] Rendering Performance >> >> Derek - I confess that I haven't been tracking the current code body, >> since my focus is on researching a next generation of TeaTime. But >> each rendering pass schedules the next one by doing a future:send: >> operation to render the space. How the delay till the next >> rendering >> is calculated is presumably still calculated in an adaptive manner, >> taking into account a variety of performance measures, like how far >> behind the current rendering is from real time. >> >> There is one other thing that could be a systemic problem, which >> relates >> to how the Squeak VM you are using decides to sleep. When there is >> nothing for Squeak to do, it tells the operating system it wants to >> sleep. Now in ancient systems (MacOS and some Unixes/Linuxes) the >> sleeping function in a process has a very poor time resolving >> capability >> - typically you sleep waiting for either an interrupt or an I/O >> transition into the process, or an elapsed time. On Unix, this is a >> select() or a poll() operation. >> >> When rendering says "run the render step again in 20 msec." it >> does no >> good if what happens is that the Squeak VM runs out of things to >> do and >> sleeps in the OS using a call that cannot respond in less than 50 >> msec. The next render step won't happen soon enough. Squeak's >> VM has >> had a variety of kludges put in to deal with that, and since there >> are >> many hands in the Squeak VM, this often breaks. >> >> I have had to put in a fix for this kind of thing in the past because >> the network driver wasn't propery prioritizing wakeups to pull the >> machine out of sleeping. I think that fix is still in there and >> correct. But a VM maintainer might want to look at that also. >> >> >> >> >> Derek Arndt wrote: >>> Hey David, >>> >>> Thanks for the confirmation of my suspicions. Like any serious >>> graphics application it's not about achieving the highest framerate >>> (VBL sync takes care of limiting the CPU and GPU nicely) but to >>> rather >>> be able to throw as much on the screen and still achieve an >>> experience >>> that's not a slideshow (250FPS works while 2.50FPS doesn't). The >>> technologies and techniques I'm currently using have been around >>> since >>> the early part of this decade and with this small amount of >>> action in >>> Croquet the framerate is unacceptable. >>> >>> I spent a few minutes tracing through the render routines to try and >>> find the scheduler limiting to play around (who knows, maybe I'll >>> find >>> out from this further profiling I'm doing something horribly wrong, >>> the more I can understand the better though) - so far no luck. >>> Could >>> you point me in the right direction or give me some quick insight >>> into >>> where I should be looking? >>> >>> Thanks again - I appreciate your time, >>> Derek >>> >>> On Jul 20, 2007, at 5:26 PM, David P. Reed wrote: >>> >>>> You are not bottlenecked by Squeak, but instead are idle. And the >>>> reason is quite simple, I suspect. Croquet is not designed to >>>> flicker your screen at 250 fps or whatever just so you can say "my >>>> graphics card is fast". It actually limits the framerate, so that >>>> other computational tasks you might run are allowed to run, and if >>>> those are quiet, the framerate is NOT increased. >>>> >>>> Some video gamers focus on framerate as their model of systems >>>> performance. That really seems irrelevant to Croquet. But if you >>>> want a higher framerate, just find the scheduling of the redraw >>>> task, >>>> and set it to run at a high framerate. You may not be able to do >>>> anything else, but you'll have your eyecandy and bragging rights. >>>> >>>> >>>> Derek Arndt wrote: >>>>> Thanks John, >>>>> >>>>> I'm coming back to this specific issue as I would like to have a >>>>> better understanding of where the framerate is going. With many >>>>> optimized objects in the scene I am unable to obtain an acceptable >>>>> framerate and I can't help but feel I'm being limited by the >>>>> rate of >>>>> which key methods are being fired in squeak. >>>>> >>>>> I've adjusted the configuration in the plist without any >>>>> noticeable >>>>> results. Below is a link to the shark profile - I'm not used to >>>>> reading much outside of the application's symbols (and because >>>>> this >>>>> code isn't a C the results aren't very useful for the app itself). >>>>> I'm interested to see what you can gleam from the profile if >>>>> anything: >>>>> >>>>> http://img409.imageshack.us/my.php?image=picture1hj4.png >>>>> >>>>> Interestingly enabling higherPerformance doesn't change the low >>>>> framerate when rendering many objects on screen (which I would >>>>> take >>>>> to mean the bottleneck is in rendering the objects - but the >>>>> framerates are much less then I would expect) >>>>> >>>>> Thanks for your time John, >>>>> Derek >>>>> >>>>> On Jul 20, 2007, at 10:46 AM, Peter Moore wrote: >>>>> >>>>>> >>>>>>> *From: *John M McIntosh <[hidden email] >>>>>>> <mailto:[hidden email]>> >>>>>>> *Date: *June 12, 2007 3:21:01 AM CDT >>>>>>> *To: *[hidden email] <mailto:[hidden email]>, Derek >>>>>>> Arndt <[hidden email] <mailto:[hidden email]>> >>>>>>> *Subject: **Re: [croquet-dev] Rendering Performance* >>>>>>> *Reply-To: *[hidden email] <mailto:[hidden email]>, >>>>>>> John M McIntosh <[hidden email] >>>>>>> <mailto:[hidden email]>> >>>>>>> >>>>>>> Derek I've never looked at the Open/GL croquet drawing with >>>>>>> relationship to the squeak displayscreen drawing. If you are >>>>>>> using >>>>>>> the carbon macintosh vm, there are some info.plist parms see >>>>>>> http://www.smalltalkconsulting.com/html/squeakinfoplist.html >>>>>>> say >>>>>>> SqueakUIFlushPrimaryDeferNMilliseconds which affect drawing >>>>>>> performance for the non open/gl stuff. >>>>>>> >>>>>>> Use of Apple's Shark should tell you where the time is going, >>>>>>> and >>>>>>> how much clock time is used sitting in the VM is sleeping >>>>>>> primitive. >>>>>>> >>>>>>> On Jun 8, 2007, at 1:11 PM, Derek Arndt wrote: >>>>>>> >>>>>>>> Hey everyone - this is my first time posting to the list. >>>>>>>> I've been a Mac shareware game developer for a number of >>>>>>>> years at >>>>>>>> http://www.batteryacid.org and now I'm jumping straight into >>>>>>>> croquet. >>>>>>>> >>>>>>>> One of the first things to strike me that I haven't been >>>>>>>> able to >>>>>>>> improve is the low framerate. In the simplest of demos I'm >>>>>>>> experiencing a maximum of 42FPS on a MacBook Pro (CPU not >>>>>>>> pegged, >>>>>>>> almost zero vertices or fragments on the screen). I've spent >>>>>>>> some time using the popular debugging tools and >>>>>>>> experimenting by >>>>>>>> rendering nothing in TSpace -> >>>>>>>> RenderSpace:port:depth:ghostFrame: >>>>>>>> and reducing the large number of GL state changes submitted to >>>>>>>> the driver. >>>>>>>> >>>>>>>> Sure it's easy to find obvious places of graphical improvement, >>>>>>>> but getting little improvement and not finding a clear >>>>>>>> graphical >>>>>>>> bottleneck leads me to think there is other limiting in >>>>>>>> place (or >>>>>>>> timers that specify how often to render). Is this correct? >>>>>>>> >>>>>>>> Thanks for your time, >>>>>>>> Derek Arndt >>>>>>>> >>>>>>>> PS. Does croquet have VBL syncing? >>>>>>> >>>>>>> -- >>>>>>> >> ===================================================================== >> ===== >> = >>>>>>> >>>>>>> John M. McIntosh <[hidden email] >>>>>>> <mailto:[hidden email]>> >>>>>>> Corporate Smalltalk Consulting Ltd. >>>>>>> http://www.smalltalkconsulting.com >>>>>>> >> ===================================================================== >> ===== >> = >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>> >>> > |
Actually, the human eye can discern 250 fps flicker, if you try to track
a very fast moving object by moving your eyes. (what you need is an object that moves more than its width each frame time, and you need to focus on it, not the scene - you will see flickering). |
In reply to this post by Ron Teitelbaum
Just guessing here, but I think 250 fps might have been a typo.
-Peter On Jul 23, 2007, at 6:03 PM, Ron Teitelbaum wrote: > Hey Derek, > > Ok I admit I'm only just scanning these emails but I have to ask the > question: Why in the world would you need 250 fps? When I was 6 > years old I > worked at an animation camera company across from Hannah Barbara. > At the > time the cameramen were up in arms because standard cartoons were > going from > 12 to 7 fps. The cameramen were arguing that 7 fps was too slow > and would > show a flicker (the cartoons looked fine to me, I think they were > upset > about the money they were loosing doing the shooting). From my > time working > in the movie business later I know that most films are like 25 > fps. When > you see someone running on a movie screen they don't look like they > are > jumping from place to place. There is no way the human eye could > discern > 250 fps. > > A frame rate of 12 is really good, 25 is excellent, 250 is nuts. > > Am I missing something? > > Ron > >> -----Original Message----- >> From: David P. Reed [mailto:[hidden email]] >> Sent: Monday, July 23, 2007 2:03 PM >> To: [hidden email]; Derek Arndt >> Subject: Re: [croquet-dev] Rendering Performance >> >> Derek - I confess that I haven't been tracking the current code body, >> since my focus is on researching a next generation of TeaTime. But >> each rendering pass schedules the next one by doing a future:send: >> operation to render the space. How the delay till the next >> rendering >> is calculated is presumably still calculated in an adaptive manner, >> taking into account a variety of performance measures, like how far >> behind the current rendering is from real time. >> >> There is one other thing that could be a systemic problem, which >> relates >> to how the Squeak VM you are using decides to sleep. When there is >> nothing for Squeak to do, it tells the operating system it wants to >> sleep. Now in ancient systems (MacOS and some Unixes/Linuxes) the >> sleeping function in a process has a very poor time resolving >> capability >> - typically you sleep waiting for either an interrupt or an I/O >> transition into the process, or an elapsed time. On Unix, this is a >> select() or a poll() operation. >> >> When rendering says "run the render step again in 20 msec." it >> does no >> good if what happens is that the Squeak VM runs out of things to >> do and >> sleeps in the OS using a call that cannot respond in less than 50 >> msec. The next render step won't happen soon enough. Squeak's >> VM has >> had a variety of kludges put in to deal with that, and since there >> are >> many hands in the Squeak VM, this often breaks. >> >> I have had to put in a fix for this kind of thing in the past because >> the network driver wasn't propery prioritizing wakeups to pull the >> machine out of sleeping. I think that fix is still in there and >> correct. But a VM maintainer might want to look at that also. >> >> >> >> >> Derek Arndt wrote: >>> Hey David, >>> >>> Thanks for the confirmation of my suspicions. Like any serious >>> graphics application it's not about achieving the highest framerate >>> (VBL sync takes care of limiting the CPU and GPU nicely) but to >>> rather >>> be able to throw as much on the screen and still achieve an >>> experience >>> that's not a slideshow (250FPS works while 2.50FPS doesn't). The >>> technologies and techniques I'm currently using have been around >>> since >>> the early part of this decade and with this small amount of >>> action in >>> Croquet the framerate is unacceptable. >>> >>> I spent a few minutes tracing through the render routines to try and >>> find the scheduler limiting to play around (who knows, maybe I'll >>> find >>> out from this further profiling I'm doing something horribly wrong, >>> the more I can understand the better though) - so far no luck. >>> Could >>> you point me in the right direction or give me some quick insight >>> into >>> where I should be looking? >>> >>> Thanks again - I appreciate your time, >>> Derek >>> >>> On Jul 20, 2007, at 5:26 PM, David P. Reed wrote: >>> >>>> You are not bottlenecked by Squeak, but instead are idle. And the >>>> reason is quite simple, I suspect. Croquet is not designed to >>>> flicker your screen at 250 fps or whatever just so you can say "my >>>> graphics card is fast". It actually limits the framerate, so that >>>> other computational tasks you might run are allowed to run, and if >>>> those are quiet, the framerate is NOT increased. >>>> >>>> Some video gamers focus on framerate as their model of systems >>>> performance. That really seems irrelevant to Croquet. But if you >>>> want a higher framerate, just find the scheduling of the redraw >>>> task, >>>> and set it to run at a high framerate. You may not be able to do >>>> anything else, but you'll have your eyecandy and bragging rights. >>>> >>>> >>>> Derek Arndt wrote: >>>>> Thanks John, >>>>> >>>>> I'm coming back to this specific issue as I would like to have a >>>>> better understanding of where the framerate is going. With many >>>>> optimized objects in the scene I am unable to obtain an acceptable >>>>> framerate and I can't help but feel I'm being limited by the >>>>> rate of >>>>> which key methods are being fired in squeak. >>>>> >>>>> I've adjusted the configuration in the plist without any >>>>> noticeable >>>>> results. Below is a link to the shark profile - I'm not used to >>>>> reading much outside of the application's symbols (and because >>>>> this >>>>> code isn't a C the results aren't very useful for the app itself). >>>>> I'm interested to see what you can gleam from the profile if >>>>> anything: >>>>> >>>>> http://img409.imageshack.us/my.php?image=picture1hj4.png >>>>> >>>>> Interestingly enabling higherPerformance doesn't change the low >>>>> framerate when rendering many objects on screen (which I would >>>>> take >>>>> to mean the bottleneck is in rendering the objects - but the >>>>> framerates are much less then I would expect) >>>>> >>>>> Thanks for your time John, >>>>> Derek >>>>> >>>>> On Jul 20, 2007, at 10:46 AM, Peter Moore wrote: >>>>> >>>>>> >>>>>>> *From: *John M McIntosh <[hidden email] >>>>>>> <mailto:[hidden email]>> >>>>>>> *Date: *June 12, 2007 3:21:01 AM CDT >>>>>>> *To: *[hidden email] <mailto:[hidden email]>, Derek >>>>>>> Arndt <[hidden email] <mailto:[hidden email]>> >>>>>>> *Subject: **Re: [croquet-dev] Rendering Performance* >>>>>>> *Reply-To: *[hidden email] <mailto:[hidden email]>, >>>>>>> John M McIntosh <[hidden email] >>>>>>> <mailto:[hidden email]>> >>>>>>> >>>>>>> Derek I've never looked at the Open/GL croquet drawing with >>>>>>> relationship to the squeak displayscreen drawing. If you are >>>>>>> using >>>>>>> the carbon macintosh vm, there are some info.plist parms see >>>>>>> http://www.smalltalkconsulting.com/html/squeakinfoplist.html >>>>>>> say >>>>>>> SqueakUIFlushPrimaryDeferNMilliseconds which affect drawing >>>>>>> performance for the non open/gl stuff. >>>>>>> >>>>>>> Use of Apple's Shark should tell you where the time is going, >>>>>>> and >>>>>>> how much clock time is used sitting in the VM is sleeping >>>>>>> primitive. >>>>>>> >>>>>>> On Jun 8, 2007, at 1:11 PM, Derek Arndt wrote: >>>>>>> >>>>>>>> Hey everyone - this is my first time posting to the list. >>>>>>>> I've been a Mac shareware game developer for a number of >>>>>>>> years at >>>>>>>> http://www.batteryacid.org and now I'm jumping straight into >>>>>>>> croquet. >>>>>>>> >>>>>>>> One of the first things to strike me that I haven't been >>>>>>>> able to >>>>>>>> improve is the low framerate. In the simplest of demos I'm >>>>>>>> experiencing a maximum of 42FPS on a MacBook Pro (CPU not >>>>>>>> pegged, >>>>>>>> almost zero vertices or fragments on the screen). I've spent >>>>>>>> some time using the popular debugging tools and >>>>>>>> experimenting by >>>>>>>> rendering nothing in TSpace -> >>>>>>>> RenderSpace:port:depth:ghostFrame: >>>>>>>> and reducing the large number of GL state changes submitted to >>>>>>>> the driver. >>>>>>>> >>>>>>>> Sure it's easy to find obvious places of graphical improvement, >>>>>>>> but getting little improvement and not finding a clear >>>>>>>> graphical >>>>>>>> bottleneck leads me to think there is other limiting in >>>>>>>> place (or >>>>>>>> timers that specify how often to render). Is this correct? >>>>>>>> >>>>>>>> Thanks for your time, >>>>>>>> Derek Arndt >>>>>>>> >>>>>>>> PS. Does croquet have VBL syncing? >>>>>>> >>>>>>> -- >>>>>>> >> ===================================================================== >> ===== >> = >>>>>>> >>>>>>> John M. McIntosh <[hidden email] >>>>>>> <mailto:[hidden email]>> >>>>>>> Corporate Smalltalk Consulting Ltd. >>>>>>> http://www.smalltalkconsulting.com >>>>>>> >> ===================================================================== >> ===== >> = >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>> >>> > |
In reply to this post by David P. Reed
Yeah that makes sense, but I guess it is what I would expect to see for a
fast moving object rendered on film or computer. Well actually on film you would see a moving smudge (a ball would be elongated along the travel path). Maybe rendered 3d objects that are moving quickly should do something similar? You know it's amazing to think that eventually our hardware/software will produce extremely realistic interactive 3d rendering. My thoughts were that it is unrealistic to expect anything more from the computer then what is currently available for film. I suppose that it is realistic to expect more. Very interesting stuff. Ron > -----Original Message----- > From: David P. Reed [mailto:[hidden email]] > Sent: Monday, July 23, 2007 8:47 PM > To: [hidden email]; Joshua Gargus > Cc: Ron Teitelbaum; 'Derek Arndt' > Subject: Re: [croquet-dev] Rendering Performance > > Actually, the human eye can discern 250 fps flicker, if you try to track > a very fast moving object by moving your eyes. (what you need is an > object that moves more than its width each frame time, and you need to > focus on it, not the scene - you will see flickering). |
I think one of the key differences between film and interactive computer environments (first person shooter games especially) is that with film the "future" is already determined. That is, what you are going to see on the next frame is already known and has been captured (either on film or digitally) in a way that looks pleasing (hopefully). That is, the shutter speed relative to the film/sensor speed, type of motion in the shot and the frame rate are all set based on the art and science of cinematography. So, in some shots there will me motion blur as the movement of the subject is faster than the shutter speed, where as others the film and shutter speed might be faster to capture the motion in a more detailed fashion (
e.g. sports slow motion).
For computer games and interactive systems, the future is determined based on various factors. The sensation of fast response times to user input or other changing conditions is very dependent on how quickly the scene can visually be updated to reflect the changes due to input. So in that sense it is less about realistic looking animation than it is about the feeling of things happening instantly at your command. So the FPS measurement so often quoted for gaming performance is really indicative of how responsive the game will be to play on a given system. This can of course translate into visual "smoothness" but is rather different from the idea of flicker or jerk that happens at frame rates below 24 or 12fps (for example). Furthermore, as was mentioned earlier, panning and the perception of a fairly static subject and a moving background are rather different beasts on computers as compared to film capturing real life. In this case motion blur in the background can aide the shot by drawing attention to the relatively static subject, which is crisp and clear. In pretty much all games I've seen and I think all films with a large amount of digital CG content, motion blur in the background, and subjects (or lack of motion blur) is what stops it looking like film. So anyway, the point is, in certain environments, comparing 150fps to 50fps to 25fps shows a big difference in use experience but it is not really to do with animation frame rates in the traditional sense. Of course it all depends on how this figure is measured. In the Croquet sense, a low quoted frame rate could simply mean there is nothing to update. In a game, it is more like a measurement of how much responsiveness is in reserve - because even if the system could render the bits at 250fps, the screen couldn't (yet). Wow, an essay. Carry on. On 24/07/07, Ron Teitelbaum <[hidden email]> wrote:
Yeah that makes sense, but I guess it is what I would expect to see for a |
In reply to this post by Ron Teitelbaum
On Jul 24, 2007, at 3:48 , Ron Teitelbaum wrote:
> Yeah that makes sense, but I guess it is what I would expect to see > for a > fast moving object rendered on film or computer. Well actually on > film you > would see a moving smudge (a ball would be elongated along the > travel path). > Maybe rendered 3d objects that are moving quickly should do something > similar? Which is called temporal anti-aliasing, well known in the computer graphics community, and achievable in real-time nowadays. That aside: You have to see it to believe it - a CRT driven by a renderer in frame lock, that is, with the rendering frame rate matching the the CRT's refresh rate (enable "vsync" in your graphics driver). It's an incredibly smooth animation, the object looks *real*. This cannot be experienced on an LCD, no matter the frame rate. I saw this for the first time with 3D graphics about 15 years ago when our department got a shiny new SGI Onyx. The fly demo was incredible. It was running in Irix Performer, which uses many techniques to ensure a steady frame rate (including heavy LODing and even lowering the resolution on the fly when bandwidth became an issue), because even a single glitch ruins the experience. That means, for stereo rendering your *minimal* framerate would have to be 150 Hz (75 per eye). And to maintain that frame rate your theoretical max frame rate (if it wasn't vsynced) for less complex scenes will obviously shoot through the roof. It's true that you can get work done and even enjoy lower frame rates, you've probably been conditioned to accept those anyway, but that's far from ideal. I'd say a rendering frame rate matching the monitor refresh rate under any and all circumstances would be the target we should aim at. And I'm sure we'll get there, eventually ;) - Bert - |
A lot of this discussion seems to be missing Derek's original point, which if I
take it correctly was about the frame rate dropping down to 2 to 5 fps when more objects are added to the scene. I'd say that this is a serious concern considering that any simulation scene that is really worth looking at is likely to have a number of objects in it. So the question is, where is the bottleneck that can be optimized to get the framerate back into the 20-40 range in a scene with lots of objects? And the related question, about how many objects is a lot or too many? |
Thanks David, this is the point I was trying to make - I apologize
for not being clear enough and spinning off this rendering performance talk which I'm sure we all actually feel similarly about. So I'm interested to see where the slow-downs could be, is it FFI layer or improperly handled sleeping perhaps? I appreciate any insight I can get. Thanks, Derek On Jul 24, 2007, at 9:55 AM, David Faught wrote: > A lot of this discussion seems to be missing Derek's original > point, which if I > take it correctly was about the frame rate dropping down to 2 to 5 > fps when > more objects are added to the scene. I'd say that this is a > serious concern > considering that any simulation scene that is really worth looking > at is likely > to have a number of objects in it. So the question is, where is > the bottleneck > that can be optimized to get the framerate back into the 20-40 > range in a scene > with lots of objects? And the related question, about how many > objects is a > lot or too many? |
Derek Arndt wrote:
> So I'm interested to see where the slow-downs could be, is it FFI layer > or improperly handled sleeping perhaps? I appreciate any insight I can > get. First thing is to measure it: * Launch Croquet * From the desktop menu choose "debug", "start MessageTally" * After driving around for a little move your mouse to the top-pixel of the window (takes some targeting but can be done ;-) You will get a message tally output which will tell you where you spend most of your time. Start optimizing there, rinse and repeat. Cheers, -Andreas |
Hey Andreas,
In one example this ATI X1600 can only achieve 9FPS when drawing 5,000 triangles (sure it's not done in one draw routine and there are state changes often, but this is still surprisingly bad). Pasted below is the profile from rendering part of this scene - in it it appears as though (going off this CPU time) the processor is doing a bit of waiting. As I don't expect GL to be that slow on this machine, I'm looking at these math routines that already appear optimized and wondering about the waits. Any opinions on the profile? Thanks, Derek - 15311 tallies, 15455 msec. **Tree** 90.1% {13925ms} TRemoteControllerConnection(Object)>>fork:at: |78.4% {12117ms} CLADemoRecordableHarness(CroquetHarness) >>renderProcess | |78.4% {12117ms} CLADemoRecordableHarness(CroquetHarness) >>renderWorld | | 77.3% {11947ms} TMessageMaker>>doesNotUnderstand: | | 77.3% {11947ms} TFarRef>>syncSend: | | 77.3% {11947ms} TFarRef>>syncSend:withArguments: | | 77.2% {11931ms} TPortal>>renderView:with: | | 77.1% {11916ms} TFarRef>>renderView:overlay: | | 77.1% {11916ms} TMessageMaker>>doesNotUnderstand: | | 77.1% {11916ms} TFarRef>>syncSend: | | 77.1% {11916ms} TFarRef>>syncSend:withArguments: | | 77.1% {11916ms} TBoundedSpace(TFrame) >>renderView:overlay: | | 77.1% {11916ms} TCamera>>renderView:space:overlay: | | 76.5% {11823ms} TBoundedSpace(TSpace) >>renderSpace: | | 76.5% {11823ms} TBoundedSpace(TSpace) >>renderSpace:port:depth:ghostFrame: | | 71.6% {11066ms} TBoundedSpace(TFrame) >>renderFrame: | | |69.3% {10710ms} TQuadTree>>renderFrame: [69.3% {10710ms} TQuadTree(TFrame)>>renderFrame: [ 69.3% {10710ms} TGroup(TFrame)>>renderFrame: [ 68.7% {10618ms} TMesh(TFrame)>>renderFrame: [ 30.0% {4637ms} TCamera>>testBounds: [ |13.8% {2133ms} TCamera(TFrame)>>lookAt [ | |8.8% {1360ms} Matrix4x4>>column3 [ | | |5.9% {912ms} Vector3 class>>x:y:z: [ | | | 2.9% {448ms} Vector3>>x:y:z: [ | |3.5% {541ms} Vector3>>normalized [ |10.5% {1623ms} TCamera(TFrame)>>globalPosition [ | |9.1% {1406ms} Matrix4x4>>translation [ | | 6.0% {927ms} Vector3 class>>x:y:z: [ | | 2.8% {433ms} Vector3>>x:y:z: [ | | 2.1% {325ms} Vector3 class(Vector class)>>new [ |3.8% {587ms} primitives [ 19.9% {3076ms} TMesh>>render: [ |19.8% {3060ms} TMesh>>renderPrimitive:alpha: [ | 15.8% {2442ms} TMaterial>>enable: [ | 15.3% {2365ms} TTexture>>enable: [ | 14.8% {2287ms} OGLMacOSX(OpenGL)>>installTexture: [ | 10.3% {1592ms} OGLTextureManager>>bindTexture: [ | |10.0% {1546ms} OGLTextureManager>>textureHandleOf: [ | | 5.1% {788ms} TObjectID(SequenceableCollection)>>= [ | | |4.4% {680ms} TObjectID (SequenceableCollection)>>hasEqualElements: [ | | | 3.6% {556ms} primitives [ | | 2.5% {386ms} Dictionary>>at:ifAbsentPut: [ | | 2.2% {340ms} Dictionary>>at:ifAbsent: [ | 4.0% {618ms} TFormManager>>resolve:distance: [ 6.4% {989ms} TMesh(TFrame)>>globalPosition [ |5.5% {850ms} Matrix4x4>>translation [ | 3.6% {556ms} Vector3 class>>x:y:z: [ 4.2% {649ms} OGLMacOSX(OpenGL)>>pushMatrix [ |2.1% {325ms} Matrix4x4 class(Vector class)>>new [ 4.1% {634ms} OGLMacOSX(OpenGL)>>popMatrix [ 2.9% {448ms} OrderedCollection>>removeLast [ 2.0% {309ms} OrderedCollection(Collection)>>emptyCheck | | 4.9% {757ms} TBoundedSpace(TSpace) >>renderSpaceAlpha:transform: | | 4.7% {726ms} TAnimatedMesh(TFrame) >>doRenderAlpha: [3.8% {587ms} TAnimatedMesh>>render: [ 3.5% {541ms} TAnimatedMesh>>updateAnimations: [ 3.5% {541ms} TSkeletonAnimation>>transformBones:atTime: |3.9% {603ms} TMessageRouterClient>>runHeartbeat | |3.9% {603ms} Delay>>wait | | 3.9% {603ms} Semaphore>>wait |3.3% {510ms} TSimpleController(TIslandController)>>runEventLoop | |3.3% {510ms} TSimpleController(TIslandController)>>advanceTo: | | 2.1% {325ms} TSimpleController(TIslandController) >>processMessages | | 2.1% {325ms} TMessageMaker>>doesNotUnderstand: | | 2.1% {325ms} TFarRef>>syncSend: | | 2.0% {309ms} TFarRef>>syncSend:withArguments: |2.2% {340ms} TLocalController>>runHeartbeat | |2.2% {340ms} Delay>>wait | | 2.2% {340ms} Semaphore>>wait |2.1% {325ms} TRemoteControllerConnection(TMessageRelay) >>runReaderProcess | 2.0% {309ms} Socket>>waitForData | 2.0% {309ms} Socket>>waitForDataIfClosed: 4.8% {742ms} PasteUpMorph>>doOneCycle |4.8% {742ms} WorldState>>doOneCycleFor: | 4.8% {742ms} WorldState>>doOneCycleNowFor: | 3.5% {541ms} PasteUpMorph>>runStepMethods | 3.5% {541ms} WorldState>>runStepMethodsIn: | 3.5% {541ms} WorldState>>runLocalStepMethodsIn: | 3.3% {510ms} StepMessage(MorphicAlarm)>>value: | 2.5% {386ms} MiniPlazaMaster(Morph)>>stepAt: | 2.4% {371ms} MiniPlazaMaster (CroquetParticipantWithMenu)>>step | 2.2% {340ms} MiniPlazaMaster(BFDParticipant)>>step | 2.2% {340ms} MiniPlazaMaster(CroquetParticipant) >>step | 2.1% {325ms} CLADemoRecordableHarness (CroquetHarness)>>step | 2.1% {325ms} OrderedCollection>>do: 3.2% {495ms} ScriptProcess>>newScript 3.2% {495ms} ScriptProcess>>privateRunMsg 2.9% {448ms} AsyncScriptMessageSend>>value 2.9% {448ms} AsyncScriptMessageSend(ScriptMessageSend) >>valueWithEvent: 2.9% {448ms} AsyncScriptMessageSend(ScriptMessageSend) >>synchronousValueWithEvent: 2.9% {448ms} AsyncScriptMessageSend(ScriptMessageSend) >>synchronousValueWithArguments:event: **Leaves** 9.8% {1515ms} Semaphore>>wait 4.1% {634ms} TObjectID(SequenceableCollection)>>hasEqualElements: 3.9% {603ms} TCamera>>testBounds: 2.9% {448ms} OrderedCollection>>do: 2.9% {448ms} Vector3 class(Behavior)>>new: 2.8% {433ms} Vector3 class>>x:y:z: 2.6% {402ms} Vector3>>x:y:z: 2.5% {386ms} Vector3 class(Vector class)>>new **Memory** old +648,888 bytes young -111,464 bytes used +537,424 bytes free -537,424 bytes **GCs** full 0 totalling 0ms (0.0% uptime) incr 3068 totalling 1,170ms (8.0% uptime), avg 0.0ms tenures 9 (avg 340 GCs/tenure) root table 0 overflows On Jul 24, 2007, at 12:31 PM, Andreas Raab wrote: > Derek Arndt wrote: >> So I'm interested to see where the slow-downs could be, is it FFI >> layer or improperly handled sleeping perhaps? I appreciate any >> insight I can get. > > First thing is to measure it: > * Launch Croquet > * From the desktop menu choose "debug", "start MessageTally" > * After driving around for a little move your mouse to the top- > pixel of the window (takes some targeting but can be done ;-) > You will get a message tally output which will tell you where you > spend most of your time. Start optimizing there, rinse and repeat. > > Cheers, > -Andreas |
Hi Derek,
Looking at the profile, two things stand out for me: 30.0% {4637ms} TCamera>>testBounds: I'm not sure why this is so high; is it possible that you have a ton of tiny meshes in your scene? If so, it might help if you make these many tiny into a few larger meshes. 5.1% {788ms} TObjectID(SequenceableCollection)>>= This is a problem we fixed recently by adding a primitive for byte array comparisons. A not quite as good fix is the following (which will also significantly improve these comparisons): TObjectID>>hasEqualElements: aByteArray ^(String compare: self with: aByteArray) = 2 Cheers, - Andreas Derek Arndt wrote: > Hey Andreas, > > In one example this ATI X1600 can only achieve 9FPS when drawing 5,000 > triangles (sure it's not done in one draw routine and there are state > changes often, but this is still surprisingly bad). > > Pasted below is the profile from rendering part of this scene - in it it > appears as though (going off this CPU time) the processor is doing a bit > of waiting. As I don't expect GL to be that slow on this machine, I'm > looking at these math routines that already appear optimized and > wondering about the waits. > > Any opinions on the profile? > > Thanks, > Derek > > - 15311 tallies, 15455 msec. > > **Tree** > 90.1% {13925ms} TRemoteControllerConnection(Object)>>fork:at: > |78.4% {12117ms} CLADemoRecordableHarness(CroquetHarness)>>renderProcess > | |78.4% {12117ms} CLADemoRecordableHarness(CroquetHarness)>>renderWorld > | | 77.3% {11947ms} TMessageMaker>>doesNotUnderstand: > | | 77.3% {11947ms} TFarRef>>syncSend: > | | 77.3% {11947ms} TFarRef>>syncSend:withArguments: > | | 77.2% {11931ms} TPortal>>renderView:with: > | | 77.1% {11916ms} TFarRef>>renderView:overlay: > | | 77.1% {11916ms} TMessageMaker>>doesNotUnderstand: > | | 77.1% {11916ms} TFarRef>>syncSend: > | | 77.1% {11916ms} TFarRef>>syncSend:withArguments: > | | 77.1% {11916ms} > TBoundedSpace(TFrame)>>renderView:overlay: > | | 77.1% {11916ms} > TCamera>>renderView:space:overlay: > | | 76.5% {11823ms} > TBoundedSpace(TSpace)>>renderSpace: > | | 76.5% {11823ms} > TBoundedSpace(TSpace)>>renderSpace:port:depth:ghostFrame: > | | 71.6% {11066ms} > TBoundedSpace(TFrame)>>renderFrame: > | | |69.3% {10710ms} TQuadTree>>renderFrame: > [69.3% {10710ms} TQuadTree(TFrame)>>renderFrame: > [ 69.3% {10710ms} TGroup(TFrame)>>renderFrame: > [ 68.7% {10618ms} TMesh(TFrame)>>renderFrame: > [ 30.0% {4637ms} TCamera>>testBounds: > [ |13.8% {2133ms} TCamera(TFrame)>>lookAt > [ | |8.8% {1360ms} Matrix4x4>>column3 > [ | | |5.9% {912ms} Vector3 class>>x:y:z: > [ | | | 2.9% {448ms} Vector3>>x:y:z: > [ | |3.5% {541ms} Vector3>>normalized > [ |10.5% {1623ms} TCamera(TFrame)>>globalPosition > [ | |9.1% {1406ms} Matrix4x4>>translation > [ | | 6.0% {927ms} Vector3 class>>x:y:z: > [ | | 2.8% {433ms} Vector3>>x:y:z: > [ | | 2.1% {325ms} Vector3 class(Vector class)>>new > [ |3.8% {587ms} primitives > [ 19.9% {3076ms} TMesh>>render: > [ |19.8% {3060ms} TMesh>>renderPrimitive:alpha: > [ | 15.8% {2442ms} TMaterial>>enable: > [ | 15.3% {2365ms} TTexture>>enable: > [ | 14.8% {2287ms} OGLMacOSX(OpenGL)>>installTexture: > [ | 10.3% {1592ms} OGLTextureManager>>bindTexture: > [ | |10.0% {1546ms} OGLTextureManager>>textureHandleOf: > [ | | 5.1% {788ms} TObjectID(SequenceableCollection)>>= > [ | | |4.4% {680ms} > TObjectID(SequenceableCollection)>>hasEqualElements: > [ | | | 3.6% {556ms} primitives > [ | | 2.5% {386ms} Dictionary>>at:ifAbsentPut: > [ | | 2.2% {340ms} Dictionary>>at:ifAbsent: > [ | 4.0% {618ms} TFormManager>>resolve:distance: > [ 6.4% {989ms} TMesh(TFrame)>>globalPosition > [ |5.5% {850ms} Matrix4x4>>translation > [ | 3.6% {556ms} Vector3 class>>x:y:z: > [ 4.2% {649ms} OGLMacOSX(OpenGL)>>pushMatrix > [ |2.1% {325ms} Matrix4x4 class(Vector class)>>new > [ 4.1% {634ms} OGLMacOSX(OpenGL)>>popMatrix > [ 2.9% {448ms} OrderedCollection>>removeLast > [ 2.0% {309ms} OrderedCollection(Collection)>>emptyCheck > | | 4.9% {757ms} > TBoundedSpace(TSpace)>>renderSpaceAlpha:transform: > | | 4.7% {726ms} > TAnimatedMesh(TFrame)>>doRenderAlpha: > [3.8% {587ms} TAnimatedMesh>>render: > [ 3.5% {541ms} TAnimatedMesh>>updateAnimations: > [ 3.5% {541ms} TSkeletonAnimation>>transformBones:atTime: > |3.9% {603ms} TMessageRouterClient>>runHeartbeat > | |3.9% {603ms} Delay>>wait > | | 3.9% {603ms} Semaphore>>wait > |3.3% {510ms} TSimpleController(TIslandController)>>runEventLoop > | |3.3% {510ms} TSimpleController(TIslandController)>>advanceTo: > | | 2.1% {325ms} TSimpleController(TIslandController)>>processMessages > | | 2.1% {325ms} TMessageMaker>>doesNotUnderstand: > | | 2.1% {325ms} TFarRef>>syncSend: > | | 2.0% {309ms} TFarRef>>syncSend:withArguments: > |2.2% {340ms} TLocalController>>runHeartbeat > | |2.2% {340ms} Delay>>wait > | | 2.2% {340ms} Semaphore>>wait > |2.1% {325ms} > TRemoteControllerConnection(TMessageRelay)>>runReaderProcess > | 2.0% {309ms} Socket>>waitForData > | 2.0% {309ms} Socket>>waitForDataIfClosed: > 4.8% {742ms} PasteUpMorph>>doOneCycle > |4.8% {742ms} WorldState>>doOneCycleFor: > | 4.8% {742ms} WorldState>>doOneCycleNowFor: > | 3.5% {541ms} PasteUpMorph>>runStepMethods > | 3.5% {541ms} WorldState>>runStepMethodsIn: > | 3.5% {541ms} WorldState>>runLocalStepMethodsIn: > | 3.3% {510ms} StepMessage(MorphicAlarm)>>value: > | 2.5% {386ms} MiniPlazaMaster(Morph)>>stepAt: > | 2.4% {371ms} > MiniPlazaMaster(CroquetParticipantWithMenu)>>step > | 2.2% {340ms} MiniPlazaMaster(BFDParticipant)>>step > | 2.2% {340ms} MiniPlazaMaster(CroquetParticipant)>>step > | 2.1% {325ms} > CLADemoRecordableHarness(CroquetHarness)>>step > | 2.1% {325ms} OrderedCollection>>do: > 3.2% {495ms} ScriptProcess>>newScript > 3.2% {495ms} ScriptProcess>>privateRunMsg > 2.9% {448ms} AsyncScriptMessageSend>>value > 2.9% {448ms} > AsyncScriptMessageSend(ScriptMessageSend)>>valueWithEvent: > 2.9% {448ms} > AsyncScriptMessageSend(ScriptMessageSend)>>synchronousValueWithEvent: > 2.9% {448ms} > AsyncScriptMessageSend(ScriptMessageSend)>>synchronousValueWithArguments:event: > > **Leaves** > 9.8% {1515ms} Semaphore>>wait > 4.1% {634ms} TObjectID(SequenceableCollection)>>hasEqualElements: > 3.9% {603ms} TCamera>>testBounds: > 2.9% {448ms} OrderedCollection>>do: > 2.9% {448ms} Vector3 class(Behavior)>>new: > 2.8% {433ms} Vector3 class>>x:y:z: > 2.6% {402ms} Vector3>>x:y:z: > 2.5% {386ms} Vector3 class(Vector class)>>new > > **Memory** > old +648,888 bytes > young -111,464 bytes > used +537,424 bytes > free -537,424 bytes > > **GCs** > full 0 totalling 0ms (0.0% uptime) > incr 3068 totalling 1,170ms (8.0% uptime), avg 0.0ms > tenures 9 (avg 340 GCs/tenure) > root table 0 overflows > > > > On Jul 24, 2007, at 12:31 PM, Andreas Raab wrote: > >> Derek Arndt wrote: >>> So I'm interested to see where the slow-downs could be, is it FFI >>> layer or improperly handled sleeping perhaps? I appreciate any >>> insight I can get. >> >> First thing is to measure it: >> * Launch Croquet >> * From the desktop menu choose "debug", "start MessageTally" >> * After driving around for a little move your mouse to the top-pixel >> of the window (takes some targeting but can be done ;-) >> You will get a message tally output which will tell you where you >> spend most of your time. Start optimizing there, rinse and repeat. >> >> Cheers, >> -Andreas > |
Free forum by Nabble | Edit this page |