Audio and Video Object Analysis

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Audio and Video Object Analysis

Kirk Fraser
I will be working toward having Cuis programmable by voice.  There are many voice transcribers which do not transcribe my voice accurately.  The first step is getting audio input then using Fast Fourier Transform (FFT in Cuis 4.2 package) to reverse it from time to frequency format to enable recognizing patterns of spoken syllables and characters.  One idea I read recently is to use overlapping frames to capture speech events that may be longer or shorter depending on speaking speed.  I don't yet know how to display FFT output to look for patterns or how to match a pattern like a triangle that may be equilateral in one speech sample but isosceles in another sample.  I hope to eventually detect some nuanced attributes like feelings being communicated.  Is anyone interested in collaborating on this project?

I'm also interested in Video object analysis.  This is to let Cuis know what it is seeing so Cuis can be used in robotic applications like the Google self-driving cars or street optical character recognition for blind or foreign language people.  The first step is getting a video stream into Cuis, which has been done in Squeak, then enable capturing and processing individual frames with arc, line, and corner detection to match patterns.  A set of pattern match methods will identify visual object classifications for more detailed analysis until the scene is described in text.  Another step would be to analyze camera motion across multiple frames to produce more accurate descriptions of 3D objects.  Is anyone interested in collaborating on this project?  
Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Kirk Fraser
Is voice recognition possible to do real time in Cuis?  I just ran some of the new Cuis package FFT demos and the plot speed is very slow compared to what is needed for voice recognition, in my opinion.  On the other hand, it still might be useful to get the software working slowly in Cuis on recorded audio files then convert to C++ or some language for real time voice like from a laptop microphone.

What is making it slow?  I'm using the latest Cog.  Would it help enough to build a multi-core machine to avoid most of the operating system overhead?  Is there some switch in Cuis to speed things up?  Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Kirk Fraser
More speed ideas...

I read of a Neural Net algorithm that was made faster by conversion from floating point to integer math.  Is it possible to convert FFT from floating point to integer?  I see no author tag on the code, just a book reference,  Does anyone understand FFT enough to help explain how it works in enough detail to redesign it to use integers?

Some for profit Smalltalk versions have shown signs of being fast on graphics such as bouncing ball demos.  Is there some tweak that can be ported to Cuis?  

If there are no fixes for Cuis and rewriting FFT is in theory not possible, then is there another language I should be looking at?  A faster Smalltalk?  Should I try to reverse engineer the older but simpler Digitalk Smalltalk/V to use Cog? 

I remember an old Borland C++ demo pasted circles to a much older computer display quite fast. What is the fastest object language that gives maximum peripheral access?  Or is there another solution I should be looking at?

Thanks anyone...  


Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Casey Ransberger-2
Start with some benchmarks. Without those, you haven't got a strong definition of what "faster" or "slow" means. Compare the numbers with other implementations. 

Fourier transforms aren't exactly the bedrock of my experience. Without looking it up I can't say what the big O end up being there. The trick is too see how far off from the known optimal a given system ends up being; small variations are likely to be consequent of implementation and platform details. Large variations are usually about wrong-algorithm. 

Anyway most of the time, if something's just a whole lot slower than it seems like it should be, there's usually some high level code that isn't using the best known algorithm. It's usually best to rule that out first. Profiling is usually the best way to find the bottleneck(s.)

If that fails, and the actual bottleneck turns out to be intrinsic to Smalltalk's semantics or the specifics of the virtual machine in use (FWIW this almost never happens in my experience) the usual approach is to implement a plugin or primitive for the VM. This can be done in either in C or in Slang. In most cases the latter will be preferable, as Slang code can be run in-image and take advantage of the system's inspection and debugging facilities. 

Good hunting, and I hope this helps. 

Casey



On Dec 15, 2013, at 10:55 PM, Kirk Fraser <[hidden email]> wrote:

More speed ideas...

I read of a Neural Net algorithm that was made faster by conversion from floating point to integer math.  Is it possible to convert FFT from floating point to integer?  I see no author tag on the code, just a book reference,  Does anyone understand FFT enough to help explain how it works in enough detail to redesign it to use integers?

Some for profit Smalltalk versions have shown signs of being fast on graphics such as bouncing ball demos.  Is there some tweak that can be ported to Cuis?  

If there are no fixes for Cuis and rewriting FFT is in theory not possible, then is there another language I should be looking at?  A faster Smalltalk?  Should I try to reverse engineer the older but simpler Digitalk Smalltalk/V to use Cog? 

I remember an old Borland C++ demo pasted circles to a much older computer display quite fast. What is the fastest object language that gives maximum peripheral access?  Or is there another solution I should be looking at?

Thanks anyone...  




View this message in context: Re: Audio and Video Object Analysis
Sent from the Cuis Smalltalk mailing list archive at Nabble.com.
_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Kirk Fraser
Ok, thanks.  I get profiling but what is "Slang?"
Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Casey Ransberger-2
It's a subset of Smalltalk with C-like semantics which can be automatically translated to C code; this is how we arrive upon our virtual machine. 

The whole story is here:


On Dec 16, 2013, at 3:50 PM, Kirk Fraser <[hidden email]> wrote:

Ok, thanks.  I get profiling but what is "Slang?"


View this message in context: Re: Audio and Video Object Analysis
Sent from the Cuis Smalltalk mailing list archive at Nabble.com.
_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Kirk Fraser
How does one hook up an existing .dll to a class definition in Cuis?  For example, Open Computer Vision 
 
 
 

Or would I be ahead to just lower myself into the pit of Python and Numpy for real time speed?  I understand some state of the art research teams for driverless cars did that. 


Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Casey Ransberger-2
So Python has some great libraries for numerical and scientific computing, but...

I'm afraid you're optimizing prematurely here. I get that you don't think Smalltalk is going to be "fast enough" to meet your goals. Again, without doing some experimentation there's really no way of validating that concern. 

The usual wisdom here is a process like:

1. Make it work
2. Make it work right
3. Make it work fast 

A relevant example (I think, feel free to correct me if I'm wrong here) is the way Juan approached the development of Morphic 3. He wrote it first in Smalltalk, to get it working, adapting Cuis to suit as development continued, but ultimately it will be incorporated in the VM as a new plugin. 

Slang provides a fairly smooth path to high performance computing in the world of Smalltalk. One has to do some work smoothing the code into the limited subset that Slang presents, but it's much less work than rewriting a bunch of wildly high-level Smalltalk code in C by hand. 

As for interfacing with other languages, depending on what you want to do, there are a few approaches. If you have some service written with another platform, using an HTTP based solution is usually the easiest, if not the most performant. If you just need a quick hack to glue some stuff together, OSProcess might not be a bad thing to explore. For heavier duty stuff, you're into FFI territory. 

If you do become an FFI wizard, do consider documenting the experience. There really isn't enough documentation on that stuff. I've used it but barely. It was a bit like feeling one's way about a cluttered attic in the dark. 

Anyway I can't help you very much, but I might be able to point you at useful avenues of inquiry. Hopefully this is that. I will say that the things you want to do are pretty cool, so keep at it and don't give up!

Casey

On Dec 16, 2013, at 10:42 PM, Kirk Fraser <[hidden email]> wrote:

How does one hook up an existing .dll to a class definition in Cuis?  For example, Open Computer Vision 
 
 
 

Or would I be ahead to just lower myself into the pit of Python and Numpy for real time speed?  I understand some state of the art research teams for driverless cars did that. 




View this message in context: Re: Audio and Video Object Analysis
Sent from the Cuis Smalltalk mailing list archive at Nabble.com.
_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Kirk Fraser
Casey,
I agree, thanks for reminding me of the steps to make it work, make it work right, and make it work fast.  On the topic of Slang, I have yet to see it anywhere other than your posts.  The paper you sent a link to did not contain the word "Slang" and neither does it appear when doing a find Class in Cuis.  Do you have a link or class with example Slang code?

I like your example of feeling in the dark doing FFI, that is one problem with Smalltalk - so little documentation.  It seems original programmers expect following programmers to just know what they did and why.  Or how to do something with it that should be obvious like how to get rid of a morph.

Juan,
Cuis 4.2 has a glitch in that department which 4.1 didn't have.  Now when you minimize a window it provides a window icon at the bottom with a label but the label morph doesn't go away when you expand the window.  Is there an easier way than returning to the original download and reinstalling Cuis?  In Cuis 1.2 all one needs to do is control click on the morph to get a menu to delete it.  What is the way now?
Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Juan Vuletich-4
In reply to this post by Kirk Fraser
Hi Kirk,

(inline)

On 12/16/2013 3:55 AM, Kirk Fraser wrote:
> More speed ideas...
>
> I read of a Neural Net algorithm that was made faster by conversion
> from floating point to integer math.  Is it possible to convert FFT
> from floating point to integer?  I see no author tag on the code, just
> a book reference,  Does anyone understand FFT enough to help explain
> how it works in enough detail to redesign it to use integers?

The FFT in Cuis is running in a VM Plugin. Original source code is in
Smalltalk / Slang, as part of the VMMaker package. This is converted to
C. The C code is compiled and bundled with the VM. Most likely it is
running as fast as C can run in your platform. And most likely it is
fast enough. But you need to a) profile and compare and b) specify (and
understand) how fast you need it to be.

> Some for profit Smalltalk versions have shown signs of being fast on
> graphics such as bouncing ball demos.  Is there some tweak that can be
> ported to Cuis?

Again, without a profile and a well done comparison, this is meaningless.

> If there are no fixes for Cuis and rewriting FFT is in theory not
> possible, then is there another language I should be looking at?

There will always be possible fixes for Cuis and for any other software.
Software can always be enhanced. But it is not as simple as asking "is
there a fix to be done?"

> A faster Smalltalk?  Should I try to reverse engineer the older but
> simpler Digitalk Smalltalk/V to use Cog?
> I remember an old Borland C++ demo pasted circles to a much older
> computer display quite fast. What is the fastest object language that
> gives maximum peripheral access?

All high level languages (including all Smalltalks, but also any other
object language) gives display access via C or assembler. You need a
much deeper understanding of the performance problem you are trying to
fix before looking for a solution.

>  Or is there another solution I should be looking at?
>
> Thanks anyone...

Good luck!
Juan Vuletich

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Juan Vuletich-4
In reply to this post by Kirk Fraser
On 12/17/2013 3:42 AM, Kirk Fraser wrote:
> How does one hook up an existing .dll to a class definition in Cuis?
>  For example, Open Computer Vision
> http://docs.opencv.org/doc/tutorials/introduction/windows_install/windows_install.html 
> download appears to have several .dlls at:
> http://sourceforge.net/projects/opencvlibrary/files/opencv-win/2.4.7/
> They also have a Linux install at
> http://docs.opencv.org/doc/tutorials/introduction/linux_install/linux_install.html?highlight=linux

You can use the FFI package. Or you can write a VM plugin that calls the
DLL, giving the Smalltalk side a potentially better api. Most likely not
a trivial work.

> Or would I be ahead to just lower myself into the pit of Python and
> Numpy for real time speed?  I understand some state of the art
> research teams for driverless cars did that.

Cuis with the appropriate plugins can be as fast as Python + Numpy, with
the advantage that stuff is easier to develop and evolve, because it is
all written in Smalltalk. Python + Numpy makes the C part too opaque to
my taste. Anyway, it is your call.

Cheers,
Juan Vuletich

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Juan Vuletich-4
In reply to this post by Kirk Fraser
(abridged)

On 12/17/2013 1:53 PM, Kirk Fraser wrote:
> ... that is one problem with Smalltalk - so little documentation.  It
> seems original programmers expect following programmers to just know
> what they did and why.

Well, the idea is that documentation is secondary to code. Smalltalk
code is to be read and understood by humans.

>  Or how to do something with it that should be obvious like how to get
> rid of a morph.
>
> Juan,
> Cuis 4.2 has a glitch in that department which 4.1 didn't have.  Now
> when you minimize a window it provides a window icon at the bottom
> with a label but the label morph doesn't go away when you expand the
> window.

Please, give a more detailed list of steps to reproduce the problem. I
just tried minimizing a window (clicking on the orange button at the top
left) and the clicking on the thumbnail in the taskbar. I didn't see any
problem.

> Is there an easier way than returning to the original download and
> reinstalling Cuis?  In Cuis 1.2 all one needs to do is control click
> on the morph to get a menu to delete it.  What is the way now?

To remove a window, click on the top left red button with a 'x' on it.
To remove any morph, middle click on it to open a halo. The top left
handle is the 'delete' handle. WRT to an easier way, it depends on what
you want to do. If you want Cuis 1.2, yes, the easiest way is to
download it and use it. If you want to understand the latest Cuis, the
easiest way is to play with it and study it.

Cheers,
Juan Vuletich


_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Casey Ransberger-2
In reply to this post by Kirk Fraser
In the Back to the Future article, Slang is described in the section under the heading "Smalltalk to C Translation," but you're right, the term "Slang" itself is not mentioned in the article. The term may not have been coined yet, though the paper was published the same year that the term came into use. I hadn't noticed that it wasn't in there!

The wiki article is here (and could use to be updated:)


To see examples of this stuff, I'd say the easiest way would be to install VMMaker in a recent Squeak image and look at the code for the interpreter VM (NOT Cog.)

To get a sense of the design of the interpreter VM, Tim Rowledge wrote a great article on that:



On Tue, Dec 17, 2013 at 8:53 AM, Kirk Fraser <[hidden email]> wrote:
Casey,
I agree, thanks for reminding me of the steps to make it work, make it work right, and make it work fast.  On the topic of Slang, I have yet to see it anywhere other than your posts.  The paper you sent a link to did not contain the word "Slang" and neither does it appear when doing a find Class in Cuis.  Do you have a link or class with example Slang code?

I like your example of feeling in the dark doing FFI, that is one problem with Smalltalk - so little documentation.  It seems original programmers expect following programmers to just know what they did and why.  Or how to do something with it that should be obvious like how to get rid of a morph.

Juan,
Cuis 4.2 has a glitch in that department which 4.1 didn't have.  Now when you minimize a window it provides a window icon at the bottom with a label but the label morph doesn't go away when you expand the window.  Is there an easier way than returning to the original download and reinstalling Cuis?  In Cuis 1.2 all one needs to do is control click on the morph to get a menu to delete it.  What is the way now?


View this message in context: Re: Audio and Video Object Analysis
Sent from the Cuis Smalltalk mailing list archive at Nabble.com.

_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org



_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

Kirk Fraser
Thank you for the articles.  I spent a lot of time today looking at Python and I do like Smalltalk best.  But tell me, would you trust Cuis or any Smalltalk to run your driverless car - during garbage collection?  


Reply | Threaded
Open this post in threaded view
|

Re: Audio and Video Object Analysis

KenDickey
On Tue, 17 Dec 2013 21:37:34 -0800 (PST)
Kirk Fraser <[hidden email]> wrote:

> Thank you for the articles.  I spent a lot of time today looking at Python
> and I do like Smalltalk best.  But tell me, would you trust Cuis or any
> Smalltalk to run your driverless car - during garbage collection?

Yes. Dynamic (garbage collected) languages are used in real-time applications.  Interestingly, one area this came up in was interactive video games which drove some GC research in real systems.

For "hard real time" (guaranteed duty cycle) you need to plan and measure.  Many objects are pre-allocated and reused.  There are also GC algorithms which cost more per allocation but bound worst case gc time [e.g. treadmill, train].  You have to measure object allocation rate(s) and know what is going on with the system -- just like the rest of the car.

In general you use the old engineering strategy, excess capacity.  You need to measure to know what the capacity is.

Cheers,
-KenD


_______________________________________________
Cuis mailing list
[hidden email]
http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
-KenD