MultiByteFileStream performance issues

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

MultiByteFileStream performance issues

timrowledge
The most recent nuScratch beta for the Pi takes a very long time to start up, which is the focus of my work today.

A moment’s profiling shows that processing the 50 language data files (they’re ‘pootle’ files, apparently) is taking ~27secs on my Pi. ACtually, for 50 files that seemed quite plausible until I checked the old Scratch image- 621mSec. Same files and actually a significantly slower VM. Hmm.

97% of the slow time was in MultiByteFileStream>contentsOfEntireFile, with half spent in #next. By contrast, deliberately using a StandardFileStream took just 300mS. Of course, that means having to manually handle the potential unicode transformations, which is going to be fun.

Since MultiByteFileStream is the default (see FileStream concreteStream) this may well mean that a lot of time is being wasted when generally loading files. Does MC rely upon the default, for example?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Can't program his way out of a for-loop.



Reply | Threaded
Open this post in threaded view
|

Re: MultiByteFileStream performance issues

Levente Uzonyi-2
On Fri, 13 Jun 2014, tim Rowledge wrote:

> The most recent nuScratch beta for the Pi takes a very long time to start up, which is the focus of my work today.
>
> A moment’s profiling shows that processing the 50 language data files (they’re ‘pootle’ files, apparently) is taking ~27secs on my Pi. ACtually, for 50 files that seemed quite plausible until I checked the old Scratch image- 621mSec. Same files and actually a significantly slower VM. Hmm.
>
> 97% of the slow time was in MultiByteFileStream>contentsOfEntireFile, with half spent in #next. By contrast, deliberately using a StandardFileStream took just 300mS. Of course, that means having to manually handle the potential unicode transformations, which is going to be fun.

What encoding do those files have?


Levente

>
> Since MultiByteFileStream is the default (see FileStream concreteStream) this may well mean that a lot of time is being wasted when generally loading files. Does MC rely upon the default, for example?
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> Useful random insult:- Can't program his way out of a for-loop.
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: MultiByteFileStream performance issues

timrowledge

On 13-06-2014, at 5:16 PM, Levente Uzonyi <[hidden email]> wrote:
>
> What encoding do those files have?

They all seem to be UTF-8 and they all get dealt with correctly so far as I can tell, though very, very slowly. We’re talking maybe 75x as long to deal with each one. If that is a common ratio we’re wasting a lot of time with files.

Swapping this particular bit of code to use a StandardFileStream and then ‘manually’ converting the relevant part to utf8  results in processing all 50 files taking 160mS on the Pi instead of 27000mS.


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Fractured Idiom:- ICH LIEBE RICH - I'm really crazy about having dough