Fuel serialize of 70MB takes forever on Linux vs. Mac

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Fuel serialize of 70MB takes forever on Linux vs. Mac

Andreas Brodbeck-3
Hi all!

I have a performance problem with Fuel (latest stable) on a Pharo 6.1
(32bit) on Ubuntu 16.04.3 LTS. The serialization and storage to disk of
70MB of data takes more than 6 minutes! While the same task on an
equally powerful Mac takes only 1.5 minutes.

My observation was, that on Linux it is very very slow while writing the
file. The first 30MB of the file are written very fast, but after that
the writing speed drops dramatically.

On the same Linux server, but the application running on an old Pharo
1.4, the serialization was always as fast as on the Mac.

Attached you see an excerpt of the Pharo Time profiler output for the
serialization. The profiling tree looks the same on the Mac, just with
faster times.

What could be the hidden speed eating secret on Linux for Pharo6.1? I
have absolutely no idea where to search! Memory? Streams? File writing
support? Something inside Fuel?


Thanks for any help!

Cheers, Andreas

--
Andreas Brodbeck
www.mindclue.ch

2017-12-18_1719.png (138K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fuel serialize of 70MB takes forever on Linux vs. Mac

Henrik-Nergaard
Hi Andreas,

It looks like you may have problems with hash collisions
FLLargeIdentityDictionary.
What are the tally sizes of the FLLargeIdentityDicitonary used when you
serialize? (If these have a tally larger than ~ 75% of the available size
(4096 items), then there may well be some performance loss).

You could check if file writing is the problem by measuring the time it
takes to only serialize in memory.
You can try to use: "FLSerializer serializeToByteArray: " and see if it
gives petter performance?

Best regards,
Henrik



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Reply | Threaded
Open this post in threaded view
|

Re: Fuel serialize of 70MB takes forever on Linux vs. Mac

Ben Coman

On 19 December 2017 at 00:35, Andreas Brodbeck <[hidden email]> wrote:
Hi all!

I have a performance problem with Fuel (latest stable) on a Pharo 6.1
(32bit) on Ubuntu 16.04.3 LTS. The serialization and storage to disk of
70MB of data takes more than 6 minutes! While the same task on an
equally powerful Mac takes only 1.5 minutes.

My observation was, that on Linux it is very very slow while writing the
file. The first 30MB of the file are written very fast, but after that
the writing speed drops dramatically.

On the same Linux server, but the application running on an old Pharo
1.4, the serialization was always as fast as on the Mac.

Attached you see an excerpt of the Pharo Time profiler output for the
serialization. The profiling tree looks the same on the Mac, just with
faster times.

What could be the hidden speed eating secret on Linux for Pharo6.1? I
have absolutely no idea where to search! Memory? Streams? File writing
support? Something inside Fuel?
 

On 19 December 2017 at 03:45, Henrik-Nergaard <[hidden email]> wrote:
Hi Andreas,

It looks like you may have problems with hash collisions
FLLargeIdentityDictionary.
What are the tally sizes of the FLLargeIdentityDicitonary used when you
serialize? (If these have a tally larger than ~ 75% of the available size
(4096 items), then there may well be some performance loss).

You could check if file writing is the problem by measuring the time it
takes to only serialize in memory.
You can try to use: "FLSerializer serializeToByteArray: " and see if it
gives better performance?

 
side comment:  
I wonder what could be the performance of hooking  #serializeToByteArray:  up directly to mmap ?

cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: Fuel serialize of 70MB takes forever on Linux vs. Mac

Andreas Brodbeck-3
In reply to this post by Henrik-Nergaard
Am 18.12.17 um 20:45 schrieb Henrik-Nergaard:
> Hi Andreas,
>
> It looks like you may have problems with hash collisions
> FLLargeIdentityDictionary.
> What are the tally sizes of the FLLargeIdentityDicitonary used when you
> serialize? (If these have a tally larger than ~ 75% of the available size
> (4096 items), then there may well be some performance loss).

Well, hm, I have FLLargeIdentityDicitonary instances with up to 300'000
items... But that's what "large" means, right?

Is there anything I can do here? I don't think so.

>
> You could check if file writing is the problem by measuring the time it
> takes to only serialize in memory.
> You can try to use: "FLSerializer serializeToByteArray: " and see if it
> gives petter performance?

Thanks for this hint! I did that, and measured the same crazy
difference, 6minutes vs. 1.5 minutes.

So: It's not the storage! It's something inside Fuel. I keep
investigating and will update here.

Cheers, Andreas


--
Andreas Brodbeck
www.mindclue.ch


Reply | Threaded
Open this post in threaded view
|

Re: Fuel serialize of 70MB takes forever on Linux vs. Mac

Mariano Martinez Peck


On Tue, Dec 19, 2017 at 7:40 AM, Andreas Brodbeck <[hidden email]> wrote:
Am 18.12.17 um 20:45 schrieb Henrik-Nergaard:
> Hi Andreas,
>
> It looks like you may have problems with hash collisions
> FLLargeIdentityDictionary.
> What are the tally sizes of the FLLargeIdentityDicitonary used when you
> serialize? (If these have a tally larger than ~ 75% of the available size
> (4096 items), then there may well be some performance loss).

Well, hm, I have FLLargeIdentityDicitonary instances with up to 300'000
items... But that's what "large" means, right?

Is there anything I can do here? I don't think so.

>
> You could check if file writing is the problem by measuring the time it
> takes to only serialize in memory.
> You can try to use: "FLSerializer serializeToByteArray: " and see if it
> gives petter performance?

Thanks for this hint! I did that, and measured the same crazy
difference, 6minutes vs. 1.5 minutes.

So: It's not the storage! It's something inside Fuel. I keep
investigating and will update here.


Hi Andreas, 

Reading the thread nothing obvious comes to my mind. Sorry. My guy feeling is that some of the primitives used by FLLargeIdentityDicitonary become slower in latest VMs. 
I guess the main one to check is #fuelPointsTo:   (primitive 132)  (see the senders). Maybe you can do a quick test (isolated from your app) and compare agains old pharo ?

Another thing would be comparing your scenario using a identity set. Just save your image before doing this:

| set dict |
set := FLLargeIdentitySet.
dict := FLLargeIdentityDictionary.
Smalltalk at: #FLLargeIdentitySet put: IdentitySet.
Smalltalk at: #FLLargeIdentityDictionary put: IdentityDictionary.

And re-run the serialization.


Anyway, cannot think of much more things now. 

Cheers,




Reply | Threaded
Open this post in threaded view
|

Re: Fuel serialize of 70MB takes forever on Linux vs. Mac

Andreas Brodbeck-3
In reply to this post by Andreas Brodbeck-3
Am 19.12.17 um 11:40 schrieb Andreas Brodbeck:
> Am 18.12.17 um 20:45 schrieb Henrik-Nergaard:
>
> So: It's not the storage! It's something inside Fuel. I keep
> investigating and will update here.

... so here are my latest findings:

I took both images onto the same machine, a Mac. And these are the
differences of the same Fuel serialization:

*********************
Fuel 2.1.10 Pharo 6.1 (#60520) 64bit: 90 seconds
Fuel 2.1.10 Pharo 6.1 (#60527) 32bit: 245 seconds
*********************

For the record, here the versions of the two (equal) VMs:

VM 32bit: 5.0 5.0.201707201942 Mac OS X built on Jul 20 2017 21:45:23
UTC Compiler: 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)
[Production Spur VM]

VM 64bit: 5.0 5.0.201707201942 Mac OS X built on Jul 20 2017 21:08:05
UTC Compiler: 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)
[Production Spur 64-bit VM]



I didn't expect the 64bit version to be a faster VM generally. What
could be the reason here?


Cheers, Andreas

--
Andreas Brodbeck
www.mindclue.ch


Reply | Threaded
Open this post in threaded view
|

Re: Fuel serialize of 70MB takes forever on Linux vs. Mac

Mariano Martinez Peck


On Fri, Dec 22, 2017 at 9:40 AM, Andreas Brodbeck <[hidden email]> wrote:
Am 19.12.17 um 11:40 schrieb Andreas Brodbeck:
> Am 18.12.17 um 20:45 schrieb Henrik-Nergaard:
>
> So: It's not the storage! It's something inside Fuel. I keep
> investigating and will update here.

... so here are my latest findings:

I took both images onto the same machine, a Mac. And these are the
differences of the same Fuel serialization:

*********************
Fuel 2.1.10 Pharo 6.1 (#60520) 64bit: 90 seconds
Fuel 2.1.10 Pharo 6.1 (#60527) 32bit: 245 seconds
*********************

For the record, here the versions of the two (equal) VMs:

VM 32bit: 5.0 5.0.201707201942 Mac OS X built on Jul 20 2017 21:45:23
UTC Compiler: 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)
[Production Spur VM]

VM 64bit: 5.0 5.0.201707201942 Mac OS X built on Jul 20 2017 21:08:05
UTC Compiler: 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)
[Production Spur 64-bit VM]



I didn't expect the 64bit version to be a faster VM generally. What
could be the reason here?


We have a special cluster for numbers that when they fit there, the serialization is faster (see #clusterClassForSmallInteger:). If bigger than the allowed range then it follows the slower kind of variable/indexed object. 
Another possibility could be Floats.
Can you check in your debug if you have a cluster (FLSmallIntegerCluster subclass ) with MANY numbers?
Else, check with many floats..

I think it should be somehow related to that...

Cheers,


 

Cheers, Andreas

--
Andreas Brodbeck
www.mindclue.ch





--
Reply | Threaded
Open this post in threaded view
|

Re: Fuel serialize of 70MB takes forever on Linux vs. Mac

Andreas Brodbeck-3
In reply to this post by Mariano Martinez Peck
Am 19.12.17 um 12:06 schrieb Mariano Martinez Peck:

> On Tue, Dec 19, 2017 at 7:40 AM, Andreas Brodbeck <[hidden email]> wrote:
>
> Hi Andreas,
>
> Reading the thread nothing obvious comes to my mind. Sorry. My guy feeling
> is that some of the primitives used by FLLargeIdentityDicitonary become
> slower in latest VMs.
> I guess the main one to check is #fuelPointsTo:   (primitive 132)  (see the
> senders). Maybe you can do a quick test (isolated from your app) and
> compare agains old pharo ?
>
> Another thing would be comparing your scenario using a identity set. Just
> save your image before doing this:
>
> | set dict |
> set := FLLargeIdentitySet.
> dict := FLLargeIdentityDictionary.
> Smalltalk at: #FLLargeIdentitySet put: IdentitySet.
> Smalltalk at: #FLLargeIdentityDictionary put: IdentityDictionary.
>
> And re-run the serialization.

I tried this interchange of FLLarge* with standard dictionaries, like
suggested above by Mariano. The results are astonishing:

**************************
240 seconds with FLLarge* classes
111 seconds with standard Pharo classes
**************************

Thats a time reduction by more than 50%!

Does this mean that FLLarge* classes are obsolete and can be substituted
by standard classes? (At least in the newer Pharos versions)

Cheers,
Andreas


--
Andreas Brodbeck
www.mindclue.ch