Smalltalk › Squeak › Squeak - Dev

Why is source code always in files only?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

26 messages Options

Tobias Pape

Why is source code always in files only?

Hi all,

We store method source _solely_ in files (.sources/.changes).
Why? We have means to attach it to Compiled methods, in fact, more than one:

CompiledMethod allInstances size. "57766."
CompiledMethod allInstances count: [:m | m properties includesKey: #source]. "0."
CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."

" also interesting "
(CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
{57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}

When doing some analysis on source code, it is a pain to _either_
always go to disk for the source _or_ cache the code myself (which may
get out of sync sooon).
Can't we just save the source code either via trailer or properties
on first access?

Best
-Tobias

Chris Muller-3

Re: Why is source code always in files only?

On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <[hidden email]> wrote:

Hi all,

We store method source _solely_ in files (.sources/.changes).
Why? We have means to attach it to Compiled methods, in fact, more than one:

CompiledMethod allInstances size. "57766."
CompiledMethod allInstances count: [:m | m properties includesKey: #source]. "0."
CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."

" also interesting "
(CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
{57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}

When doing some analysis on source code, it is a pain to _either_
always go to disk for the source _or_ cache the code myself (which may
get out of sync sooon).

If you're sending messages instead of viewing private innards, why is it a pain?

Can't we just save the source code either via trailer or properties
on first access?

-1. Why do I want all of those String's in my image?

Best
-Tobias

Tobias Pape

Re: Why is source code always in files only?

On 19.01.2015, at 18:34, Chris Muller <[hidden email]> wrote:

> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <[hidden email]> wrote:
> Hi all,
>
>
> We store method source _solely_ in files (.sources/.changes).
> Why? We have means to attach it to Compiled methods, in fact, more than one:
>
>
> CompiledMethod allInstances size. "57766."
> CompiledMethod allInstances count: [:m | m properties includesKey: #source]. "0."
> CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
> CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."
>
>
> " also interesting "
> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}
>
>
> When doing some analysis on source code, it is a pain to _either_
> always go to disk for the source _or_ cache the code myself (which may
> get out of sync sooon).
>
> If you're sending messages instead of viewing private innards, why is it a pain?

What do you mean?

Calling getSource on a CM goes 300km to disk instead of 1m to memory (metaphorically spoken)
and when I do analysis on source code I typically do stuff like that a lot.
And as developer I really dislike that I have to choose between either

a) bad performance due to excessive IO (yes I want to access the source a lot)
b) caching things myself when already two ways of storing them are available.

>
> Can't we just save the source code either via trailer or properties
> on first access?
>
> -1. Why do I want all of those String's in my image?

To do stuff to them.
Like, analysing how many dots are in them, or how often someone crafts a Symbol.
Analysis stuff.
Currently, I have a separate structure that holds onto the code once retrieved
from disk. But once the method change (eg, recompilation) I have to first detect,
that it happened, and second flush and refill this cache. I find this tiresome.

Best
-Tobias

>
>
>
>
> Best
> -Tobias

PS: HTML-mails f*ck up quotation levels in replies :(
Apple mail just flattens them when I reply. Anyone knows a workaround?

Levente Uzonyi-2

Re: Why is source code always in files only?

On Mon, 19 Jan 2015, Tobias Pape wrote:

>
> On 19.01.2015, at 18:34, Chris Muller <[hidden email]> wrote:
>
>> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <[hidden email]> wrote:
>> Hi all,
>>
>>
>> We store method source _solely_ in files (.sources/.changes).
>> Why? We have means to attach it to Compiled methods, in fact, more than one:
>>
>>
>> CompiledMethod allInstances size. "57766."
>> CompiledMethod allInstances count: [:m | m properties includesKey: #source]. "0."
>> CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
>> CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."
>>
>>
>> " also interesting "
>> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
>> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}
>>
>>
>> When doing some analysis on source code, it is a pain to _either_
>> always go to disk for the source _or_ cache the code myself (which may
>> get out of sync sooon).
>>
>> If you're sending messages instead of viewing private innards, why is it a pain?
>
> What do you mean?
>
> Calling getSource on a CM goes 300km to disk instead of 1m to memory (metaphorically spoken)
> and when I do analysis on source code I typically do stuff like that a lot.
> And as developer I really dislike that I have to choose between either
>
> a) bad performance due to excessive IO (yes I want to access the source a lot)
> b) caching things myself when already two ways of storing them are available.

On today's machines you don't have to. Once you read the data from the
disk, it'll be cached in memory. It would be faster to access the sources,
if they were stored in a trailer, but that would bump the image size by
about 15 MB (uncompressed), or 9 MB (compressed):

| size compressedSize |
size := compressedSize := 0.
CurrentReadOnlySourceFiles cacheDuring: [
SystemNavigation default allSelectorsAndMethodsDo: [ :behavior
:selector :method |
| string compressed |
string := method getSource asString.
compressed := string squeakToUtf8 zipped.
size := size + string byteSize + ((string size > 255)
asBit + 1 * 4).
compressedSize := compressedSize + compressed byteSize +
((compressed size > 255) asBit + 1 * 4) ] ].
{ size. compressedSize }.

"==> #(15003880 9057408)"

>
>>
>> Can't we just save the source code either via trailer or properties
>> on first access?
>>
>> -1. Why do I want all of those String's in my image?
>
> To do stuff to them.
> Like, analysing how many dots are in them, or how often someone crafts a Symbol.
> Analysis stuff.
> Currently, I have a separate structure that holds onto the code once retrieved
> from disk. But once the method change (eg, recompilation) I have to first detect,
> that it happened, and second flush and refill this cache. I find this tiresome.

Do you flush your cache selectively?

Scanning all source code for a given pattern takes less than a second
(~800 ms) on my machine. What's your performance goal?

Levente

>
> Best
> -Tobias
>
>>
>>
>>
>>
>> Best
>> -Tobias
>
>
>
>
>
> PS: HTML-mails f*ck up quotation levels in replies :(
> Apple mail just flattens them when I reply. Anyone knows a workaround?
>

Tobias Pape

Re: Why is source code always in files only?

On 19.01.2015, at 21:51, Levente Uzonyi <[hidden email]> wrote:

> On Mon, 19 Jan 2015, Tobias Pape wrote:
>
>>
>> On 19.01.2015, at 18:34, Chris Muller <[hidden email]> wrote:
>>
>>> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <[hidden email]> wrote:
>>> Hi all,
>>>
>>>
>>> We store method source _solely_ in files (.sources/.changes).
>>> Why? We have means to attach it to Compiled methods, in fact, more than one:
>>>
>>>
>>> CompiledMethod allInstances size. "57766."
>>> CompiledMethod allInstances count: [:m | m properties includesKey: #source]. "0."
>>> CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
>>> CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."
>>>
>>>
>>> " also interesting "
>>> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
>>> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}
>>>
>>>
>>> When doing some analysis on source code, it is a pain to _either_
>>> always go to disk for the source _or_ cache the code myself (which may
>>> get out of sync sooon).
>>>
>>> If you're sending messages instead of viewing private innards, why is it a pain?
>>
>> What do you mean?
>>
>> Calling getSource on a CM goes 300km to disk instead of 1m to memory (metaphorically spoken)
>> and when I do analysis on source code I typically do stuff like that a lot.
>> And as developer I really dislike that I have to choose between either
>>
>> a) bad performance due to excessive IO (yes I want to access the source a lot)
>> b) caching things myself when already two ways of storing them are available.
>
> On today's machines you don't have to. Once you read the data from the disk, it'll be cached in memory. It would be faster to access the sources, if they were stored in a trailer, but that would bump the image size by about 15 MB (uncompressed), or 9 MB (compressed):
>

I understand. But for a development image, I'd take that burden.

> | size compressedSize |
> size := compressedSize := 0.
> CurrentReadOnlySourceFiles cacheDuring: [
> SystemNavigation default allSelectorsAndMethodsDo: [ :behavior :selector :method |
> | string compressed |
> string := method getSource asString.
> compressed := string squeakToUtf8 zipped.
> size := size + string byteSize + ((string size > 255) asBit + 1 * 4).
> compressedSize := compressedSize + compressed byteSize + ((compressed size > 255) asBit + 1 * 4) ] ].
> { size. compressedSize }.
>
> "==> #(15003880 9057408)"

What I am actually wondering about,
there are two completely different ways to _access_ source stored in the image
but no way to actually _store_ it there.

>
>>
>>>
>>> Can't we just save the source code either via trailer or properties
>>> on first access?
>>>
>>> -1. Why do I want all of those String's in my image?
>>
>> To do stuff to them.
>> Like, analysing how many dots are in them, or how often someone crafts a Symbol.
>> Analysis stuff.
>> Currently, I have a separate structure that holds onto the code once retrieved
>> from disk. But once the method change (eg, recompilation) I have to first detect,
>> that it happened, and second flush and refill this cache. I find this tiresome.
>
> Do you flush your cache selectively?

No, I can't for reasons :)

>
> Scanning all source code for a given pattern takes less than a second (~800 ms) on my machine. What's your performance goal?

I have ~15.000 Methods that I have to compare line by line against each other.
Doing that by going to the filesystem just kills it.

Best
-Tobias

Nicolas Cellier

Re: Why is source code always in files only?

Hi Tobias,
are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]

This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance...

2015-01-19 22:10 GMT+01:00 Tobias Pape <[hidden email]>:

On 19.01.2015, at 21:51, Levente Uzonyi <[hidden email]> wrote:

> On Mon, 19 Jan 2015, Tobias Pape wrote:
>
>>
>> On 19.01.2015, at 18:34, Chris Muller <[hidden email]> wrote:
>>
>>> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <[hidden email]> wrote:
>>> Hi all,
>>>
>>>
>>> We store method source _solely_ in files (.sources/.changes).
>>> Why? We have means to attach it to Compiled methods, in fact, more than one:
>>>
>>>
>>> CompiledMethod allInstances size. "57766."
>>> CompiledMethod allInstances count: [:m | m properties includesKey: #source]. "0."
>>> CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
>>> CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."
>>>
>>>
>>> " also interesting "
>>> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
>>> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}
>>>
>>>
>>> When doing some analysis on source code, it is a pain to _either_
>>> always go to disk for the source _or_ cache the code myself (which may
>>> get out of sync sooon).
>>>
>>> If you're sending messages instead of viewing private innards, why is it a pain?
>>
>> What do you mean?
>>
>> Calling getSource on a CM goes 300km to disk instead of 1m to memory (metaphorically spoken)
>> and when I do analysis on source code I typically do stuff like that a lot.
>> And as developer I really dislike that I have to choose between either
>>
>> a) bad performance due to excessive IO (yes I want to access the source a lot)
>> b) caching things myself when already two ways of storing them are available.
>
> On today's machines you don't have to. Once you read the data from the disk, it'll be cached in memory. It would be faster to access the sources, if they were stored in a trailer, but that would bump the image size by about 15 MB (uncompressed), or 9 MB (compressed):
>

I understand. But for a development image, I'd take that burden.

> | size compressedSize |
> size := compressedSize := 0.
> CurrentReadOnlySourceFiles cacheDuring: [
> SystemNavigation default allSelectorsAndMethodsDo: [ :behavior :selector :method |
> | string compressed |
> string := method getSource asString.
> compressed := string squeakToUtf8 zipped.
> size := size + string byteSize + ((string size > 255) asBit + 1 * 4).
> compressedSize := compressedSize + compressed byteSize + ((compressed size > 255) asBit + 1 * 4) ] ].
> { size. compressedSize }.
>
> "==> #(15003880 9057408)"

What I am actually wondering about,
there are two completely different ways to _access_ source stored in the image
but no way to actually _store_ it there.

>
>>
>>>
>>> Can't we just save the source code either via trailer or properties
>>> on first access?
>>>
>>> -1. Why do I want all of those String's in my image?
>>
>> To do stuff to them.
>> Like, analysing how many dots are in them, or how often someone crafts a Symbol.
>> Analysis stuff.
>> Currently, I have a separate structure that holds onto the code once retrieved
>> from disk. But once the method change (eg, recompilation) I have to first detect,
>> that it happened, and second flush and refill this cache. I find this tiresome.
>
> Do you flush your cache selectively?

No, I can't for reasons :)

>
> Scanning all source code for a given pattern takes less than a second (~800 ms) on my machine. What's your performance goal?

I have ~15.000 Methods that I have to compare line by line against each other.
Doing that by going to the filesystem just kills it.

Best
-Tobias

Tobias Pape

Re: Why is source code always in files only?

Hi,

On 19.01.2015, at 22:31, Nicolas Cellier <[hidden email]> wrote:

> Hi Tobias,
> are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
> This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance...
>

No, I was not aware. :)
Thanks for that information.

Best
-Tobias

> 2015-01-19 22:10 GMT+01:00 Tobias Pape <[hidden email]>:
>
> On 19.01.2015, at 21:51, Levente Uzonyi <[hidden email]> wrote:
>
> > On Mon, 19 Jan 2015, Tobias Pape wrote:
> >
> >>
> >> On 19.01.2015, at 18:34, Chris Muller <[hidden email]> wrote:
> >>
> >>> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <[hidden email]> wrote:
> >>> Hi all,
> >>>
> >>>
> >>> We store method source _solely_ in files (.sources/.changes).
> >>> Why? We have means to attach it to Compiled methods, in fact, more than one:
> >>>
> >>>
> >>> CompiledMethod allInstances size. "57766."
> >>> CompiledMethod allInstances count: [:m | m properties includesKey: #source]. "0."
> >>> CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
> >>> CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."
> >>>
> >>>
> >>> " also interesting "
> >>> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
> >>> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}
> >>>
> >>>
> >>> When doing some analysis on source code, it is a pain to _either_
> >>> always go to disk for the source _or_ cache the code myself (which may
> >>> get out of sync sooon).
> >>>
> >>> If you're sending messages instead of viewing private innards, why is it a pain?
> >>
> >> What do you mean?
> >>
> >> Calling getSource on a CM goes 300km to disk instead of 1m to memory (metaphorically spoken)
> >> and when I do analysis on source code I typically do stuff like that a lot.
> >> And as developer I really dislike that I have to choose between either
> >>
> >> a) bad performance due to excessive IO (yes I want to access the source a lot)
> >> b) caching things myself when already two ways of storing them are available.
> >
> > On today's machines you don't have to. Once you read the data from the disk, it'll be cached in memory. It would be faster to access the sources, if they were stored in a trailer, but that would bump the image size by about 15 MB (uncompressed), or 9 MB (compressed):
> >
>
> I understand. But for a development image, I'd take that burden.
>
> > | size compressedSize |
> > size := compressedSize := 0.
> > CurrentReadOnlySourceFiles cacheDuring: [
> > SystemNavigation default allSelectorsAndMethodsDo: [ :behavior :selector :method |
> > | string compressed |
> > string := method getSource asString.
> > compressed := string squeakToUtf8 zipped.
> > size := size + string byteSize + ((string size > 255) asBit + 1 * 4).
> > compressedSize := compressedSize + compressed byteSize + ((compressed size > 255) asBit + 1 * 4) ] ].
> > { size. compressedSize }.
> >
> > "==> #(15003880 9057408)"
>
>
> What I am actually wondering about,
> there are two completely different ways to _access_ source stored in the image
> but no way to actually _store_ it there.
>
> >
> >>
> >>>
> >>> Can't we just save the source code either via trailer or properties
> >>> on first access?
> >>>
> >>> -1. Why do I want all of those String's in my image?
> >>
> >> To do stuff to them.
> >> Like, analysing how many dots are in them, or how often someone crafts a Symbol.
> >> Analysis stuff.
> >> Currently, I have a separate structure that holds onto the code once retrieved
> >> from disk. But once the method change (eg, recompilation) I have to first detect,
> >> that it happened, and second flush and refill this cache. I find this tiresome.
> >
> > Do you flush your cache selectively?
>
> No, I can't for reasons :)
>
> >
> > Scanning all source code for a given pattern takes less than a second (~800 ms) on my machine. What's your performance goal?
>
> I have ~15.000 Methods that I have to compare line by line against each other.
> Doing that by going to the filesystem just kills it.
>
>
> Best
> -Tobias

Levente Uzonyi-2

Re: Why is source code always in files only?

In reply to this post by Tobias Pape

On Mon, 19 Jan 2015, Tobias Pape wrote:

>
> On 19.01.2015, at 21:51, Levente Uzonyi <[hidden email]> wrote:
>
>> On Mon, 19 Jan 2015, Tobias Pape wrote:
>>
>>>
>>> On 19.01.2015, at 18:34, Chris Muller <[hidden email]> wrote:
>>>
>>>> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <[hidden email]> wrote:
>>>> Hi all,
>>>>
>>>>
>>>> We store method source _solely_ in files (.sources/.changes).
>>>> Why? We have means to attach it to Compiled methods, in fact, more than one:
>>>>
>>>>
>>>> CompiledMethod allInstances size. "57766."
>>>> CompiledMethod allInstances count: [:m | m properties includesKey: #source]. "0."
>>>> CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
>>>> CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."
>>>>
>>>>
>>>> " also interesting "
>>>> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
>>>> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}
>>>>
>>>>
>>>> When doing some analysis on source code, it is a pain to _either_
>>>> always go to disk for the source _or_ cache the code myself (which may
>>>> get out of sync sooon).
>>>>
>>>> If you're sending messages instead of viewing private innards, why is it a pain?
>>>
>>> What do you mean?
>>>
>>> Calling getSource on a CM goes 300km to disk instead of 1m to memory (metaphorically spoken)
>>> and when I do analysis on source code I typically do stuff like that a lot.
>>> And as developer I really dislike that I have to choose between either
>>>
>>> a) bad performance due to excessive IO (yes I want to access the source a lot)
>>> b) caching things myself when already two ways of storing them are available.
>>
>> On today's machines you don't have to. Once you read the data from the disk, it'll be cached in memory. It would be faster to access the sources, if they were stored in a trailer, but that would bump the image size by about 15 MB (uncompressed), or 9 MB (compressed):
>>
>
> I understand. But for a development image, I'd take that burden.
>
>> | size compressedSize |
>> size := compressedSize := 0.
>> CurrentReadOnlySourceFiles cacheDuring: [
>> SystemNavigation default allSelectorsAndMethodsDo: [ :behavior :selector :method |
>> | string compressed |
>> string := method getSource asString.
>> compressed := string squeakToUtf8 zipped.
>> size := size + string byteSize + ((string size > 255) asBit + 1 * 4).
>> compressedSize := compressedSize + compressed byteSize + ((compressed size > 255) asBit + 1 * 4) ] ].
>> { size. compressedSize }.
>>
>> "==> #(15003880 9057408)"
>
>
> What I am actually wondering about,
> there are two completely different ways to _access_ source stored in the image
> but no way to actually _store_ it there.

You can use #dropSourcePointer to embed the source of a method in the
image. For 15k methods you better swap the methods with custom code which
converts them in a single batch.

>
>>
>>>
>>>>
>>>> Can't we just save the source code either via trailer or properties
>>>> on first access?
>>>>
>>>> -1. Why do I want all of those String's in my image?
>>>
>>> To do stuff to them.
>>> Like, analysing how many dots are in them, or how often someone crafts a Symbol.
>>> Analysis stuff.
>>> Currently, I have a separate structure that holds onto the code once retrieved
>>> from disk. But once the method change (eg, recompilation) I have to first detect,
>>> that it happened, and second flush and refill this cache. I find this tiresome.
>>
>> Do you flush your cache selectively?
>
> No, I can't for reasons :)
>
>>
>> Scanning all source code for a given pattern takes less than a second (~800 ms) on my machine. What's your performance goal?
>
> I have ~15.000 Methods that I have to compare line by line against each other.
> Doing that by going to the filesystem just kills it.

It's hard to tell much without knowing the exact problem. If you want to
take a method and compare it with all previously processed methods line by
line, then you can create a dictionary which maps lines to methods (or
method-line number pairs).

Levente

>
>
> Best
> -Tobias
>
>
>

Levente Uzonyi-2

Re: Why is source code always in files only?

In reply to this post by Nicolas Cellier

Yup, that makes a difference. Too bad it's still notification based,
instead of being a process local variable. That would make it even faster.

Levente

On Mon, 19 Jan 2015, Nicolas Cellier wrote:

> Hi Tobias,
> are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
> This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance...
>
> 2015-01-19 22:10 GMT+01:00 Tobias Pape <[hidden email]>:
>
> On 19.01.2015, at 21:51, Levente Uzonyi <[hidden email]> wrote:
>
> > On Mon, 19 Jan 2015, Tobias Pape wrote:
> >
> >>
> >> On 19.01.2015, at 18:34, Chris Muller <[hidden email]> wrote:
> >>
> >>> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <[hidden email]> wrote:
> >>> Hi all,
> >>>
> >>>
> >>> We store method source _solely_ in files (.sources/.changes).
> >>> Why? We have means to attach it to Compiled methods, in fact, more than one:
> >>>
> >>>
> >>> CompiledMethod allInstances size. "57766."
> >>> CompiledMethod allInstances count: [:m | m properties includesKey: #source]. "0."
> >>> CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
> >>> CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."
> >>>
> >>>
> >>> " also interesting "
> >>> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
> >>> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}
> >>>
> >>>
> >>> When doing some analysis on source code, it is a pain to _either_
> >>> always go to disk for the source _or_ cache the code myself (which may
> >>> get out of sync sooon).
> >>>
> >>> If you're sending messages instead of viewing private innards, why is it a pain?
> >>
> >> What do you mean?
> >>
> >> Calling getSource on a CM goes 300km to disk instead of 1m to memory (metaphorically spoken)
> >> and when I do analysis on source code I typically do stuff like that a lot.
> >> And as developer I really dislike that I have to choose between either
> >>
> >> a) bad performance due to excessive IO (yes I want to access the source a lot)
> >> b) caching things myself when already two ways of storing them are available.
> >
> > On today's machines you don't have to. Once you read the data from the disk, it'll be cached in memory. It would be faster to access the sources, if they were stored in a trailer, but that would
> bump the image size by about 15 MB (uncompressed), or 9 MB (compressed):
> >
>
> I understand. But for a development image, I'd take that burden.
>
> > | size compressedSize |
> > size := compressedSize := 0.
> > CurrentReadOnlySourceFiles cacheDuring: [
> > SystemNavigation default allSelectorsAndMethodsDo: [ :behavior :selector :method |
> > | string compressed |
> > string := method getSource asString.
> > compressed := string squeakToUtf8 zipped.
> > size := size + string byteSize + ((string size > 255) asBit + 1 * 4).
> > compressedSize := compressedSize + compressed byteSize + ((compressed size > 255) asBit + 1 * 4) ] ].
> > { size. compressedSize }.
> >
> > "==> #(15003880 9057408)"
>
>
> What I am actually wondering about,
> there are two completely different ways to _access_ source stored in the image
> but no way to actually _store_ it there.
>
> >
> >>
> >>>
> >>> Can't we just save the source code either via trailer or properties
> >>> on first access?
> >>>
> >>> -1. Why do I want all of those String's in my image?
> >>
> >> To do stuff to them.
> >> Like, analysing how many dots are in them, or how often someone crafts a Symbol.
> >> Analysis stuff.
> >> Currently, I have a separate structure that holds onto the code once retrieved
> >> from disk. But once the method change (eg, recompilation) I have to first detect,
> >> that it happened, and second flush and refill this cache. I find this tiresome.
> >
> > Do you flush your cache selectively?
>
> No, I can't for reasons :)
>
> >
> > Scanning all source code for a given pattern takes less than a second (~800 ms) on my machine. What's your performance goal?
>
> I have ~15.000 Methods that I have to compare line by line against each other.
> Doing that by going to the filesystem just kills it.
>
>
> Best
> -Tobias
>
>
>
>
>

Tobias Pape

Re: Why is source code always in files only?

In reply to this post by Levente Uzonyi-2

Hey

On 19.01.2015, at 22:56, Levente Uzonyi <[hidden email]> wrote:

> On Mon, 19 Jan 2015, Tobias Pape wrote:
>
>>
>> On 19.01.2015, at 21:51, Levente Uzonyi <[hidden email]> wrote:
>>
>>> On Mon, 19 Jan 2015, Tobias Pape wrote:
>>>
>> […]
>> What I am actually wondering about,
>> there are two completely different ways to _access_ source stored in the image
>> but no way to actually _store_ it there.
>
> You can use #dropSourcePointer to embed the source of a method in the image. For 15k methods you better swap the methods with custom code which converts them in a single batch.

ah. This goes into the trailer then?
I think I actually misread some parts in CompiledMethod>>#sourceCode: and
thought that code was dysfunctional. My bad.
Thanks for resolving this mystery.

The second one still remains:

CompiledMethod>>
getSourceFor: selector in: class
"Retrieve or reconstruct the source code for this method."
| trailer source |
(self properties includesKey: #source) ifTrue:
[^self properties at: #source].
trailer := self trailer.
" ... "

Judging from my image, this is never written, right?

>
>>
>>>
>>>>
>>>>>
>>>>> Can't we just save the source code either via trailer or properties
>>>>> on first access?
>>>>>
>>>>> -1. Why do I want all of those String's in my image?
>>>>
>>>> To do stuff to them.
>>>> Like, analysing how many dots are in them, or how often someone crafts a Symbol.
>>>> Analysis stuff.
>>>> Currently, I have a separate structure that holds onto the code once retrieved
>>>> from disk. But once the method change (eg, recompilation) I have to first detect,
>>>> that it happened, and second flush and refill this cache. I find this tiresome.
>>>
>>> Do you flush your cache selectively?
>>
>> No, I can't for reasons :)
>>
>>>
>>> Scanning all source code for a given pattern takes less than a second (~800 ms) on my machine. What's your performance goal?
>>
>> I have ~15.000 Methods that I have to compare line by line against each other.
>> Doing that by going to the filesystem just kills it.
>
> It's hard to tell much without knowing the exact problem. If you want to take a method and compare it with all previously processed methods line by line, then you can create a dictionary which maps lines to methods (or method-line number pairs).
>

Well what I did in the meantime (besides caching the source)
was introducing an intermediate object that
a) holds onto the string for a line of source code
b) compares by identity and not its string's content and
c) is interned via a Dictionary on the class side.
(a bit like symbols but I didn't want to misuse them)

That way, I can resort to identity based duplicate checking :)

Best
-Tobias

> Levente
>
>>
>>
>> Best
>> -Tobias

Eliot Miranda-2

Re: Why is source code always in files only?

In reply to this post by Nicolas Cellier

On Mon, Jan 19, 2015 at 1:31 PM, Nicolas Cellier <[hidden email]> wrote:

Hi Tobias,
are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance...

IMO this is a bug. We should simply have a single read-only copy of each sources file and modify the debugger to either save and restore the state of a read-only copy around accessing source, or use its own read-only copy (except that the latter approach breaks when one debugs the debugger). The difference in performance between using CurrentReadOnlySourceFiles cacheDuring: [...] and not in anything that accesses source is huge. And CurrentReadOnlySourceFiles cacheDuring: [...] is a /lot/ of verbiage to type in doits, and a sign that something is wrong.

2015-01-19 22:10 GMT+01:00 Tobias Pape <[hidden email]>:

On 19.01.2015, at 21:51, Levente Uzonyi <[hidden email]> wrote:

> On Mon, 19 Jan 2015, Tobias Pape wrote:
>
>>
>> On 19.01.2015, at 18:34, Chris Muller <[hidden email]> wrote:
>>
>>> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <[hidden email]> wrote:
>>> Hi all,
>>>
>>>
>>> We store method source _solely_ in files (.sources/.changes).
>>> Why? We have means to attach it to Compiled methods, in fact, more than one:
>>>
>>>
>>> CompiledMethod allInstances size. "57766."
>>> CompiledMethod allInstances count: [:m | m properties includesKey: #source]. "0."
>>> CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
>>> CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."
>>>
>>>
>>> " also interesting "
>>> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
>>> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}
>>>
>>>
>>> When doing some analysis on source code, it is a pain to _either_
>>> always go to disk for the source _or_ cache the code myself (which may
>>> get out of sync sooon).
>>>
>>> If you're sending messages instead of viewing private innards, why is it a pain?
>>
>> What do you mean?
>>
>> Calling getSource on a CM goes 300km to disk instead of 1m to memory (metaphorically spoken)
>> and when I do analysis on source code I typically do stuff like that a lot.
>> And as developer I really dislike that I have to choose between either
>>
>> a) bad performance due to excessive IO (yes I want to access the source a lot)
>> b) caching things myself when already two ways of storing them are available.
>
> On today's machines you don't have to. Once you read the data from the disk, it'll be cached in memory. It would be faster to access the sources, if they were stored in a trailer, but that would bump the image size by about 15 MB (uncompressed), or 9 MB (compressed):
>

I understand. But for a development image, I'd take that burden.

> | size compressedSize |
> size := compressedSize := 0.
> CurrentReadOnlySourceFiles cacheDuring: [
> SystemNavigation default allSelectorsAndMethodsDo: [ :behavior :selector :method |
> | string compressed |
> string := method getSource asString.
> compressed := string squeakToUtf8 zipped.
> size := size + string byteSize + ((string size > 255) asBit + 1 * 4).
> compressedSize := compressedSize + compressed byteSize + ((compressed size > 255) asBit + 1 * 4) ] ].
> { size. compressedSize }.
>
> "==> #(15003880 9057408)"

What I am actually wondering about,
there are two completely different ways to _access_ source stored in the image
but no way to actually _store_ it there.

>
>>
>>>
>>> Can't we just save the source code either via trailer or properties
>>> on first access?
>>>
>>> -1. Why do I want all of those String's in my image?
>>
>> To do stuff to them.
>> Like, analysing how many dots are in them, or how often someone crafts a Symbol.
>> Analysis stuff.
>> Currently, I have a separate structure that holds onto the code once retrieved
>> from disk. But once the method change (eg, recompilation) I have to first detect,
>> that it happened, and second flush and refill this cache. I find this tiresome.
>
> Do you flush your cache selectively?

No, I can't for reasons :)

>
> Scanning all source code for a given pattern takes less than a second (~800 ms) on my machine. What's your performance goal?

I have ~15.000 Methods that I have to compare line by line against each other.
Doing that by going to the filesystem just kills it.

Best
-Tobias

best,

Eliot

Levente Uzonyi-2

Read-only source files (was: Re: [squeak-dev] Why is source code always in files only?)

On Mon, 19 Jan 2015, Eliot Miranda wrote:

>
>
> On Mon, Jan 19, 2015 at 1:31 PM, Nicolas Cellier <[hidden email]> wrote:
> Hi Tobias,
> are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
> This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance...
>
>
> IMO this is a bug. We should simply have a single read-only copy of each sources file and modify the debugger to either save and restore the state of a read-only copy around accessing source, or use its own
> read-only copy (except that the latter approach breaks when one debugs the debugger). The difference in performance between using CurrentReadOnlySourceFiles cacheDuring: [...] and not in anything that
> accesses source is huge. And CurrentReadOnlySourceFiles cacheDuring: [...] is a /lot/ of verbiage to type in doits, and a sign that something is wrong.

How would using a single copy solve the concurrency issues?

I think the real solution would be to use per process copies, which were
initialized lazily, and were closed automatically after some time of
inactivity.

Levente

Eliot Miranda-2

Re: Read-only source files (was: Re: [squeak-dev] Why is source code always in files only?)

On Mon, Jan 19, 2015 at 5:09 PM, Levente Uzonyi <[hidden email]> wrote:

On Mon, 19 Jan 2015, Eliot Miranda wrote:

On Mon, Jan 19, 2015 at 1:31 PM, Nicolas Cellier <[hidden email]> wrote:
Hi Tobias,
are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance...

IMO this is a bug. We should simply have a single read-only copy of each sources file and modify the debugger to either save and restore the state of a read-only copy around accessing source, or use its own
read-only copy (except that the latter approach breaks when one debugs the debugger). The difference in performance between using CurrentReadOnlySourceFiles cacheDuring: [...] and not in anything that
accesses source is huge. And CurrentReadOnlySourceFiles cacheDuring: [...] is a /lot/ of verbiage to type in doits, and a sign that something is wrong.

How would using a single copy solve the concurrency issues?

It wouldn't, but what issues are you seeing in concurrent source access? VW doesn't even have read-only copies and AFAICR we never had complaints about this. Is there really a thread-safety issue here?

I think the real solution would be to use per process copies, which were initialized lazily, and were closed automatically after some time of inactivity.

If concurrent access was really an issue then OK. But first I'd like some evidence that there's a real problem here.

best,

Eliot

Nicolas Cellier

Re: Read-only source files (was: Re: [squeak-dev] Why is source code always in files only?)

2015-01-20 2:18 GMT+01:00 Eliot Miranda <[hidden email]>:

On Mon, Jan 19, 2015 at 5:09 PM, Levente Uzonyi <[hidden email]> wrote:
On Mon, 19 Jan 2015, Eliot Miranda wrote:

On Mon, Jan 19, 2015 at 1:31 PM, Nicolas Cellier <[hidden email]> wrote:
Hi Tobias,
are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance...

IMO this is a bug. We should simply have a single read-only copy of each sources file and modify the debugger to either save and restore the state of a read-only copy around accessing source, or use its own
read-only copy (except that the latter approach breaks when one debugs the debugger). The difference in performance between using CurrentReadOnlySourceFiles cacheDuring: [...] and not in anything that
accesses source is huge. And CurrentReadOnlySourceFiles cacheDuring: [...] is a /lot/ of verbiage to type in doits, and a sign that something is wrong.

How would using a single copy solve the concurrency issues?

It wouldn't, but what issues are you seeing in concurrent source access? VW doesn't even have read-only copies and AFAICR we never had complaints about this. Is there really a thread-safety issue here?

Couldn't there be a difference because VW properly handle read-append-stream for the underlying FILE?

I have the feeling that such FILE is more robust to single write - multiple read concurrency, than a random-read-write FILE as used by Squeak...

I think the real solution would be to use per process copies, which were initialized lazily, and were closed automatically after some time of inactivity.

If concurrent access was really an issue then OK. But first I'd like some evidence that there's a real problem here.
--
best,
Eliot

Nicolas Cellier

Re: Why is source code always in files only?

In reply to this post by Eliot Miranda-2

2015-01-20 1:29 GMT+01:00 Eliot Miranda <[hidden email]>:

On Mon, Jan 19, 2015 at 1:31 PM, Nicolas Cellier <[hidden email]> wrote:
Hi Tobias,
are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance...

IMO this is a bug. We should simply have a single read-only copy of each sources file and modify the debugger to either save and restore the state of a read-only copy around accessing source, or use its own read-only copy (except that the latter approach breaks when one debugs the debugger). The difference in performance between using CurrentReadOnlySourceFiles cacheDuring: [...] and not in anything that accesses source is huge. And CurrentReadOnlySourceFiles cacheDuring: [...] is a /lot/ of verbiage to type in doits, and a sign that something is wrong.

Yes it's smells... It's not our business, an encapsulation is missing.

2015-01-19 22:10 GMT+01:00 Tobias Pape <[hidden email]>:

On 19.01.2015, at 21:51, Levente Uzonyi <[hidden email]> wrote:

> On Mon, 19 Jan 2015, Tobias Pape wrote:
>
>>
>> On 19.01.2015, at 18:34, Chris Muller <[hidden email]> wrote:
>>
>>> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <[hidden email]> wrote:
>>> Hi all,
>>>
>>>
>>> We store method source _solely_ in files (.sources/.changes).
>>> Why? We have means to attach it to Compiled methods, in fact, more than one:
>>>
>>>
>>> CompiledMethod allInstances size. "57766."
>>> CompiledMethod allInstances count: [:m | m properties includesKey: #source]. "0."
>>> CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
>>> CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."
>>>
>>>
>>> " also interesting "
>>> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
>>> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}
>>>
>>>
>>> When doing some analysis on source code, it is a pain to _either_
>>> always go to disk for the source _or_ cache the code myself (which may
>>> get out of sync sooon).
>>>
>>> If you're sending messages instead of viewing private innards, why is it a pain?
>>
>> What do you mean?
>>
>> Calling getSource on a CM goes 300km to disk instead of 1m to memory (metaphorically spoken)
>> and when I do analysis on source code I typically do stuff like that a lot.
>> And as developer I really dislike that I have to choose between either
>>
>> a) bad performance due to excessive IO (yes I want to access the source a lot)
>> b) caching things myself when already two ways of storing them are available.
>
> On today's machines you don't have to. Once you read the data from the disk, it'll be cached in memory. It would be faster to access the sources, if they were stored in a trailer, but that would bump the image size by about 15 MB (uncompressed), or 9 MB (compressed):
>

I understand. But for a development image, I'd take that burden.

> | size compressedSize |
> size := compressedSize := 0.
> CurrentReadOnlySourceFiles cacheDuring: [
> SystemNavigation default allSelectorsAndMethodsDo: [ :behavior :selector :method |
> | string compressed |
> string := method getSource asString.
> compressed := string squeakToUtf8 zipped.
> size := size + string byteSize + ((string size > 255) asBit + 1 * 4).
> compressedSize := compressedSize + compressed byteSize + ((compressed size > 255) asBit + 1 * 4) ] ].
> { size. compressedSize }.
>
> "==> #(15003880 9057408)"

What I am actually wondering about,
there are two completely different ways to _access_ source stored in the image
but no way to actually _store_ it there.

>
>>
>>>
>>> Can't we just save the source code either via trailer or properties
>>> on first access?
>>>
>>> -1. Why do I want all of those String's in my image?
>>
>> To do stuff to them.
>> Like, analysing how many dots are in them, or how often someone crafts a Symbol.
>> Analysis stuff.
>> Currently, I have a separate structure that holds onto the code once retrieved
>> from disk. But once the method change (eg, recompilation) I have to first detect,
>> that it happened, and second flush and refill this cache. I find this tiresome.
>
> Do you flush your cache selectively?

No, I can't for reasons :)

>
> Scanning all source code for a given pattern takes less than a second (~800 ms) on my machine. What's your performance goal?

I have ~15.000 Methods that I have to compare line by line against each other.
Doing that by going to the filesystem just kills it.

Best
-Tobias
--
best,
Eliot

timrowledge

Re: Read-only source files (was: Re: [squeak-dev] Why is source code always in files only?)

In reply to this post by Nicolas Cellier

On 20-01-2015, at 2:22 AM, Nicolas Cellier <[hidden email]> wrote:

>
>
> 2015-01-20 2:18 GMT+01:00 Eliot Miranda <[hidden email]>:
>
>
> On Mon, Jan 19, 2015 at 5:09 PM, Levente Uzonyi <[hidden email]> wrote:
> On Mon, 19 Jan 2015, Eliot Miranda wrote:
>
>
>
> On Mon, Jan 19, 2015 at 1:31 PM, Nicolas Cellier <[hidden email]> wrote:
> Hi Tobias,
> are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
> This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance…
>

I’ve no idea if concurrency is the actual issue in your problem but the simple way to solve any issues with thread related access to files is to rewrite the damned FilePlugin to use a proper api. The separate set-pos & read-bytes + write-bytes is stupid. Use a read-bytes-from-this-pos type api. At least that way you have some confidence you’ll get the bytes you thought you wanted. Not to mention that -as has been discussed a gazillion times already - FilePlugin and the general file/dir handling in Squeak is extremely grungy.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: EFB: Emulate Five-volt Battery mode

Eliot Miranda-2

Re: Read-only source files (was: Re: [squeak-dev] Why is source code always in files only?)

In reply to this post by Nicolas Cellier

On Tue, Jan 20, 2015 at 2:22 AM, Nicolas Cellier <[hidden email]> wrote:

2015-01-20 2:18 GMT+01:00 Eliot Miranda <[hidden email]>:

On Mon, Jan 19, 2015 at 5:09 PM, Levente Uzonyi <[hidden email]> wrote:
On Mon, 19 Jan 2015, Eliot Miranda wrote:

On Mon, Jan 19, 2015 at 1:31 PM, Nicolas Cellier <[hidden email]> wrote:
Hi Tobias,
are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance...

IMO this is a bug. We should simply have a single read-only copy of each sources file and modify the debugger to either save and restore the state of a read-only copy around accessing source, or use its own
read-only copy (except that the latter approach breaks when one debugs the debugger). The difference in performance between using CurrentReadOnlySourceFiles cacheDuring: [...] and not in anything that
accesses source is huge. And CurrentReadOnlySourceFiles cacheDuring: [...] is a /lot/ of verbiage to type in doits, and a sign that something is wrong.

How would using a single copy solve the concurrency issues?

It wouldn't, but what issues are you seeing in concurrent source access? VW doesn't even have read-only copies and AFAICR we never had complaints about this. Is there really a thread-safety issue here?

Couldn't there be a difference because VW properly handle read-append-stream for the underlying FILE?
I have the feeling that such FILE is more robust to single write - multiple read concurrency, than a random-read-write FILE as used by Squeak...

Maybe. And Tim's point about the file api is well0-taken, but its much more work at the Smalltalk level than at the VM level.

However, I still want to see evidence of the potential thread-safety issues. Access to the source files for writing is not thread-safe anyway but no one complains about that. So I suspect we may be trying to fix a problem that doesn't really exist. IMO it is much more preferable to have fast access to source than thread-safety. If someone wants thread-safe access they can roll their own solution (e.g. install a wrapper hiding a mutex around the source files). The common case of accessing source should be fast. The difference between

self systemNavigation allSelect: [:m| m getSourceFromFile asString includesSubString: 'not likely']

and

CurrentReadOnlySourceFiles cacheDuring: [self systemNavigation allSelect: [:m| m getSourceFromFile asString includesSubString: 'not likely']]

is 34 to 1 (!!), 28 seconds vs 0.8.

:-(

I think the real solution would be to use per process copies, which were initialized lazily, and were closed automatically after some time of inactivity.

If concurrent access was really an issue then OK. But first I'd like some evidence that there's a real problem here.
--
best,
Eliot

best,

Eliot

Levente Uzonyi-2

Re: Read-only source files (was: Re: [squeak-dev] Why is source code always in files only?)

On Tue, 20 Jan 2015, Eliot Miranda wrote:

>
>
> On Tue, Jan 20, 2015 at 2:22 AM, Nicolas Cellier <[hidden email]> wrote:
>
>
> 2015-01-20 2:18 GMT+01:00 Eliot Miranda <[hidden email]>:
>
>
> On Mon, Jan 19, 2015 at 5:09 PM, Levente Uzonyi <[hidden email]> wrote:
> On Mon, 19 Jan 2015, Eliot Miranda wrote:
>
>
>
> On Mon, Jan 19, 2015 at 1:31 PM, Nicolas Cellier <[hidden email]> wrote:
> Hi Tobias,
> are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
> This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance...
>
>
> IMO this is a bug. We should simply have a single read-only copy of each sources file and modify the debugger to either save and restore the state of a
> read-only copy around accessing source, or use its own
> read-only copy (except that the latter approach breaks when one debugs the debugger). The difference in performance between using CurrentReadOnlySourceFiles
> cacheDuring: [...] and not in anything that
> accesses source is huge. And CurrentReadOnlySourceFiles cacheDuring: [...] is a /lot/ of verbiage to type in doits, and a sign that something is wrong.
>
>
> How would using a single copy solve the concurrency issues?
>
>
> It wouldn't, but what issues are you seeing in concurrent source access? VW doesn't even have read-only copies and AFAICR we never had complaints about this. Is there really a thread-safety
> issue here?
>
>
>
> Couldn't there be a difference because VW properly handle read-append-stream for the underlying FILE?
> I have the feeling that such FILE is more robust to single write - multiple read concurrency, than a random-read-write FILE as used by Squeak...
>
>
> Maybe. And Tim's point about the file api is well0-taken, but its much more work at the Smalltalk level than at the VM level.
>
> However, I still want to see evidence of the potential thread-safety issues. Access to the source files for writing is not thread-safe anyway but no one complains about that. So I suspect we may be trying
> to fix a problem that doesn't really exist. IMO it is much more preferable to have fast access to source than thread-safety. If someone wants thread-safe access they can roll their own solution (e.g.
> install a wrapper hiding a mutex around the source files). The common case of accessing source should be fast. The difference between

I couldn't find out what the exact case was, but some tools (e.g. Slint)
use background processes to examine a set of classes and their methods.

>
> self systemNavigation allSelect: [:m| m getSourceFromFile asString includesSubString: 'not likely']
>
> and
>
> CurrentReadOnlySourceFiles cacheDuring: [self systemNavigation allSelect: [:m| m getSourceFromFile asString includesSubString: 'not likely']]
>
> is 34 to 1 (!!), 28 seconds vs 0.8.

I guess your numbers include the time it takes for the OS to read the data
from the disk. This is added to the first test of the first run, but it's
not added to the second test, because then the files are already cached.

The numbers on my machine - ignoring the first run - are 3.215s and
0.685s, which means 4.7x speedup.

Levente

>
> :-(
>
>
> I think the real solution would be to use per process copies, which were initialized lazily, and were closed automatically after some time of inactivity.
>
>
> If concurrent access was really an issue then OK. But first I'd like some evidence that there's a real problem here.
> --
> best,Eliot
>
>
>
>
>
>
>
>
>
>
> --
> best,Eliot
>
>

Eliot Miranda-2

Re: Read-only source files (was: Re: [squeak-dev] Why is source code always in files only?)

Hi Levente,

On Tue, Jan 20, 2015 at 1:31 PM, Levente Uzonyi <[hidden email]> wrote:

On Tue, 20 Jan 2015, Eliot Miranda wrote:

On Tue, Jan 20, 2015 at 2:22 AM, Nicolas Cellier <[hidden email]> wrote:

2015-01-20 2:18 GMT+01:00 Eliot Miranda <[hidden email]>:

On Mon, Jan 19, 2015 at 5:09 PM, Levente Uzonyi <[hidden email]> wrote:
On Mon, 19 Jan 2015, Eliot Miranda wrote:

On Mon, Jan 19, 2015 at 1:31 PM, Nicolas Cellier <[hidden email]> wrote:
Hi Tobias,
are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
This is to workaround the readOnlyCopy used for thread safety which is the main killer of performance...

IMO this is a bug. We should simply have a single read-only copy of each sources file and modify the debugger to either save and restore the state of a
read-only copy around accessing source, or use its own
read-only copy (except that the latter approach breaks when one debugs the debugger). The difference in performance between using CurrentReadOnlySourceFiles
cacheDuring: [...] and not in anything that
accesses source is huge. And CurrentReadOnlySourceFiles cacheDuring: [...] is a /lot/ of verbiage to type in doits, and a sign that something is wrong.

How would using a single copy solve the concurrency issues?

It wouldn't, but what issues are you seeing in concurrent source access? VW doesn't even have read-only copies and AFAICR we never had complaints about this. Is there really a thread-safety
issue here?

Couldn't there be a difference because VW properly handle read-append-stream for the underlying FILE?
I have the feeling that such FILE is more robust to single write - multiple read concurrency, than a random-read-write FILE as used by Squeak...

Maybe. And Tim's point about the file api is well0-taken, but its much more work at the Smalltalk level than at the VM level.

However, I still want to see evidence of the potential thread-safety issues. Access to the source files for writing is not thread-safe anyway but no one complains about that. So I suspect we may be trying
to fix a problem that doesn't really exist. IMO it is much more preferable to have fast access to source than thread-safety. If someone wants thread-safe access they can roll their own solution (e.g.
install a wrapper hiding a mutex around the source files). The common case of accessing source should be fast. The difference between

I couldn't find out what the exact case was, but some tools (e.g. Slint) use background processes to examine a set of classes and their methods.

self systemNavigation allSelect: [:m| m getSourceFromFile asString includesSubString: 'not likely']

and

CurrentReadOnlySourceFiles cacheDuring: [self systemNavigation allSelect: [:m| m getSourceFromFile asString includesSubString: 'not likely']]

is 34 to 1 (!!), 28 seconds vs 0.8.

I guess your numbers include the time it takes for the OS to read the data from the disk. This is added to the first test of the first run, but it's not added to the second test, because then the files are already cached.

The numbers on my machine - ignoring the first run - are 3.215s and 0.685s, which means 4.7x speedup.

Good point. When I run the cacheing version first my times are

640ms vs 6589ms, 10.3 to 1. That's still very large.

Levente

:-(

I think the real solution would be to use per process copies, which were initialized lazily, and were closed automatically after some time of inactivity.

If concurrent access was really an issue then OK. But first I'd like some evidence that there's a real problem here.
--
best,Eliot

--
best,Eliot

best,

Eliot

timrowledge

Re: Read-only source files (was: Re: [squeak-dev] Why is source code always in files only?)

On 20-01-2015, at 3:01 PM, Eliot Miranda <[hidden email]> wrote:
>
> I couldn't find out what the exact case was, but some tools (e.g. Slint) use background processes to examine a set of classes and their methods.

Doesn’t Shout read the sources when colouring them?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
"Bother!" said Pooh, searching for the $10m winning lottery ticket.