Code profiling

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Code profiling

Jeff M.
It would be nice to have some high-level profiling tools available in
Dolphin. Let me explain. Currently I can write some simple wrappers to
try and profile low-level code (ie, function-level profiling). But I
can't profile my code on-the-whole as it relates to the Dolphin VM.

Right now, I have a pretty basic, trimmed down, Direct3D application up
and running. Currently, rendering 2 (yes, TWO) triangles has a
performance of about 90 FPS. That's insanely bad. I expected Dolphin to
be a lot slower than C, but not by that much. At worst, the scene
should be running at about 300 FPS. Regardless, there are several
possible reasons for why the performance is so bad. But I can't profile
these out to really find out which it is (or a combination of them):

* Short-lived objects. Does Dolphin handle small memory allocations and
short-lived objects well? Right now I'm actually creating vertex lists
on-the-fly every frame. Yes, this is bad, but without profiling tools,
I have no idea whether this is a bottleneck or not.

* COM wrappers. What's the cost of type checking, fixing operand size
(bignum -> integer), type changing (int -> float), etc? Right now I
have no idea. But there certainly are a lot of these calls going on.

* Window event handling. Currently (for ease of use) I'm subclassing
ShellView for my window. But I don't know how much time is taken up
doing [SessionManager inputState pumpMessages]. I may want to just use
CreateWindow and PeekMessage myself. Maybe not. Still, without numbers,
I don't know.

* Message passing. I don't have a big hierarchy, or large numbers of
objects by an stretch, but for factoring reasons, things are broken up
quite a bit. What's the general cost of message passing?

* Garbage collection. Related to short-lived objects, but how often is
the GC running? When it does, how long does it take?

This is what I mean by high-level profiling. It would be nice to be
able to set a switch, and then get timings or at least percentage
breakdowns of where my code is going (because obviously gathering this
info would slow everything down). If COM wrappers are a problem area, I
need to know that. I can make more globals instead of having
short-lived objects local to a function if that's a problem. But at the
moment I'm very much stabbing in the dark.

Jeff M.


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Chris Uppal-3
Jeff,

> It would be nice to have some high-level profiling tools available in
> Dolphin.

You didn't mention Ian Bartholomew's profiler so I guess you haven't tried that
yet.  I don't know whether it would provide precisely the information you are
looking for, but it would certainly help a lot.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Ian B
Chris,

>You didn't mention Ian Bartholomew's profiler so I guess you haven't tried that
>yet.  I don't know whether it would provide precisely the information you are
>looking for, but it would certainly help a lot.

It would help, but I don't think it would provide some of the
precision that Jeff is looking for.  There are two main problems with
the Profiler as it stands...

- it can only sample at a maximum of 1mS intervals.  This was probably
OK on the slower machines that were in use when the Profiler was first
written but is too slow now.

- it will only take a sample when Dolphin allows it to - basically
before and after a "normal" message send. Anything unusual, api calls
/ optimised methods, get included in the enclosing sample.

The original Dolphin Profiler, named Svengali, that appeared back in
the early Dolphin beta could do more (it had a full execution tracer
and a GC monitor IIRC) so it is possible to provide more information -
just a bit difficult to implement.

What I think it really needs is some sort of VM support.  Something
like a VM interrupt that, optionally, occurs after every n opcodes are
evaluated?
--
Ian

The From address is valid - for the moment


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Jeff M.
Ian,

Once I figured out the 2 lines required to use your profiler, it
actually helped tremendously. I would still love the profiling ability
that I mentioned in the original post from Andy and Blair, but your
tool did a wonderful job (something this nice should be included with
the distribution, IMO).

It's always nice to see when 90% of the execution time is found to be
in one location (isn't that always the case?). Regardless, FPS is up to
> 1000. This is incredibly good news, and I huge worry is now gone. :-)

Thanks for taking the time to write this great tool, btw!

Jeff M.


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Steve Alan Waring
Hi Jeff,

> I would still love the profiling ability
> that I mentioned in the original post from Andy and Blair,

I agree, this would be a useful tool.

> It's always nice to see when 90% of the execution time is found to be
> in one location (isn't that always the case?). Regardless, FPS is up to
> > 1000. This is incredibly good news, and I huge worry is now gone. :-)

Can I ask what the problem was?

Also, how are you driving your renderLoop ... are you using a process
or WM_TIMER or ?

When I read your original post I dug out my old DX8 image to have a
play around. I remember switching from a renderLoop based on a forked
process (similar to DoubleBufferedView) to a renderLoop based on
WM_TIMER and seeing a big increase in frame rate. I never figured out
why ... but in any case if you are getting 1000 fps ... you don't seem
to have a problem!

Steve
--
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Jeff M.
Steve Alan Waring wrote:
> Hi Jeff,
>
> > It's always nice to see when 90% of the execution time is found to be
> > in one location (isn't that always the case?). Regardless, FPS is up to
> > > 1000. This is incredibly good news, and I huge worry is now gone. :-)
>
> Can I ask what the problem was?

It was actually a stupid mistake on my part that is a decent
performance hic-up in Direct3D period (state changes), but the same
code in C++ isn't as big a performance hit as it was in Dolphin.
Basically, at the start of every frame I was resetting all the render
states. This actually has me concerned that the COM wrappers in Dolphin
will be (eventually) a big performance sync, and that I'll have to
buffer as much as I can. That change alone took the FPS up into the
4,000 region on my workstation at work (1K at home).

> Also, how are you driving your renderLoop ... are you using a process
> or WM_TIMER or ?

Right now, I just create my own "main" loop. It looks something like
this:

GameEngine>>run
    [view isOpen] whileTrue:
        [self postRedisplay.
        SessionManager inputState pumpMessages.
        framecount := framecount + 1].

Probably not the way I'll do it at the end of the day, but it's simple
to test and should give the best performance. Funny enough, after the
above performance bottleneck was removed, the second biggest is
"SessionManager inputState". Making that a local variable increases
performance by another 100 or so FPS (that's a BIG deal - what could
that possibly be doing that's so intensive?). Running with the profiler
revealed several such instances (example: Keyboard default).

Jeff M.


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Steve Alan Waring
Hi Jeff,

> code in C++ isn't as big a performance hit as it was in Dolphin.

Just out of interest, for the kind of game you are building, if it was
in C++, what is the percentage of time it would spend in "user" code vs
directX itself?

> This actually has me concerned that the COM wrappers in Dolphin
> will be (eventually) a big performance sync, and that I'll have to
> buffer as much as I can.

You can optimize the creation of ExternalStructure instances fairly
easily ... the base system does this in a number of places ... for
example RECT class>>left:top:right:bottom:

Classes like StructureArrays, which involve an additional layer of
moving bytes around, can also be easily optimized by creating your own
specialized constructors or specialized classes.

I think the bottom line is that if you really need/want to, you can
bypass the COM ExternalStructure wrappers and just work directly with
ByteArrays ... which is relatively very fast for Dolphin.

> "SessionManager inputState". Making that a local variable increases
> performance by another 100 or so FPS (that's a BIG deal - what could
> that possibly be doing that's so intensive?). Running with the profiler
> revealed several such instances (example: Keyboard default).

I am not sure. Ian's profiler is very useful, but when you get down to
the leaves, it can be misleading at times as to what exactly is taking
the time ... especially when external interfacing is involved.

I have never worked on such performance critical code that I would
notice the difference between a global access and a temporary ... but I
have seen Blair and Andy do it in the base system.

Steve
 --
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Chris Uppal-3
In reply to this post by Ian B
Ian,

> - it can only sample at a maximum of 1mS intervals.  This was probably
> OK on the slower machines that were in use when the Profiler was first
> written but is too slow now.

A finer granularity would probably help, but I find the current 1ms interval to
work OK.  It's just a matter of ensuring that the code runs for a decent period
(several seconds at least) under the profiler.  That means that I have to run
more loops now than in the old days -- but the test doesn't take any longer to
execute.


> What I think it really needs is some sort of VM support.  Something
> like a VM interrupt that, optionally, occurs after every n opcodes are
> evaluated?

That might be interesting too.

But just a thought -- the profiler probably hasn't been working as well as it
should on the current D6 VM since (as Blair mentioned a little while back) the
VM wasn't checking for (speaking broadly) interrupt conditions as often as it
should.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Chris Uppal-3
In reply to this post by Jeff M.
Jeff,

> GameEngine>>run
>     [view isOpen] whileTrue:
>         [self postRedisplay.
>         SessionManager inputState pumpMessages.
>         framecount := framecount + 1].
>
> Probably not the way I'll do it at the end of the day, but it's simple
> to test and should give the best performance. Funny enough, after the
> above performance bottleneck was removed, the second biggest is
> "SessionManager inputState". Making that a local variable increases
> performance by another 100 or so FPS (that's a BIG deal - what could
> that possibly be doing that's so intensive?).

That pattern:

    SessionManager inputState

is used all over in Dolphin, and it's worth keeping an eye open for occurrences
in the inner loops of performance-critical code.  But it's not /that/ big a
drag -- on my 1.5 GHz machine that expression takes 77 nanoseconds in D6[*], so
it's hard to see why executing it a thousand or so times a second is making a
detectable difference to your loop.

Incidentally, Dolphin's external call mechanism is pretty fast, but in the
usual pattern of using external libraries, with calls like:
        MyExternalLib default doSomethingWith: someValue.
the overhead of:
    MyExternalLib default
is larger than the overhead of the external call itself.  Which is one reason
why some of the most heavily used ExternalLibraries are subclassed from
PermanentLibrary -- which has a faster, but less flexible, implementation of
class-side #default.

    -- chris


[*] Down from 235 nanos in D5 !


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Jeff M.
Chris,

Thanks for the input! I haven't been able to completely test the
external calls and COM wrappers for timings yet, but your comments
boost my confidence. Once I got this basic hic-up out of the way, I've
been able to plow ahead quite successfully, and each addition from here
on has had very little negative effect on framerate.

Jeff M.


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Pavel-2
In reply to this post by Ian B
> - it can only sample at a maximum of 1mS intervals.  This was probably
> OK on the slower machines that were in use when the Profiler was first
> written but is too slow now.
>
What is "sample interval"? I can't to find any help for it . :(

Pavel


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Ian B
Pavel,

>What is "sample interval"? I can't to find any help for it . :(

The profiler takes a snapshot of the current execution stack sleeps
for a bit, takes another snapshot, sleeps for a bit etc.   The "sample
interval" just defines the number of milliseconds it sleeps between
taking snapshots.
--
Ian

The From address is valid - for the moment


Reply | Threaded
Open this post in threaded view
|

Re: Code profiling

Pavel
Tnx!!!