Smalltalk › Pharo › Pharo Smalltalk Developers

Pavel's ChangeLog week of 2018-01-22

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

1 message

Pavel Krivanek-3

Pavel's ChangeLog week of 2018-01-22

Hi,

this week not much time left for some interesting work on Pharo. Some notes:

With Marcus, we were looking at the resurrecting of the old
possibility to place method source code directly into the method
trailer so it is stored as byte array at the end of the compiled
method. Because no new object is needed for the method code, it does
not cause more stress on the GC and it is the very natural way how to
store code. Surprisingly it was slower than currently used access to
the method source stored on disk. It shows how optimized the current
way is.
Because most of the time was spent on UTF-8 conversion of the code, we
decided to introduce a new method trailer kind for the wide strings
and for the standard strings to do not do any conversion at all. For
details see the e-mail "Speed up #embeddSourceInTrailer read and
write" by Marcus.

I came across an interesting case when the internal representation of
a dictionary and hash construction has an unexpected performance
effect. Let's have an identity set where keys are words and values are
identity sets of sentences that contain these words. Then you want to
sort them by value size to see what words are the most used. The
performance is surprisingly dependent on sorting order.

[ identityWordsDict associations asSortedCollection: [ :a :b |
a value size <= b value size ] ] timeToRun.
"0:00:09:26.149"

..but if you swap order:

[ identityWordsDict associations asSortedCollection: [ :a :b |
a value size > b value size ] ] timeToRun.
"0:00:00:00.476"

I find it during analysis of about 4.5 millions of french sentences
from a large set of books. The goal was to gain the frequency analysis
of the word forms and be able to collect supporting study material for
foreign words learning. You can select the words you know and find a
large set of long example sentences that contain only them to see the
words in usage and practice them. Then you can select a new word and
generate a new set of sentences that contain it together with words
you already know. And this way to incrementally learn the language.

The image with the data has over 2 GB so it was a nice opportunity to
see how Pharo and the collections behave in such conditions.

Cheers,
-- Pavel