Smalltalk › Squeak › Squeak - Dev

Can we extract type information from the VM?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

10 messages Options

Frank Shearar-3

Can we extract type information from the VM?

I was rereading Phlip's "what's wrong with our IDEs" post -
http://www.oreillynet.com/onlamp/blog/2008/05/dynamic_languages_vs_editors.html
- and realised that he's just verbalised something I've only
half-thought.

When we run our tests (because of course we're using TDD) we know the
precise types/expected classes of everything, because the VM
automatically collects (or can collect) this information.

But how do we get that information out of the VM?

frank

Florin Mateoc-4

Re: Can we extract type information from the VM?

On 9/15/2013 5:54 AM, Frank Shearar wrote:

> I was rereading Phlip's "what's wrong with our IDEs" post -
> http://www.oreillynet.com/onlamp/blog/2008/05/dynamic_languages_vs_editors.html
> - and realised that he's just verbalised something I've only
> half-thought.
>
> When we run our tests (because of course we're using TDD) we know the
> precise types/expected classes of everything, because the VM
> automatically collects (or can collect) this information.
>
> But how do we get that information out of the VM?
>
> frank
>
>

You don't need to extract it from the VM, you can have a type profiler that collects it for you in the image.

Florin

Frank Shearar-3

Re: Can we extract type information from the VM?

On 15 Sep 2013, at 14:57, Florin Mateoc <[hidden email]> wrote:

> On 9/15/2013 5:54 AM, Frank Shearar wrote:
>> I was rereading Phlip's "what's wrong with our IDEs" post -
>> http://www.oreillynet.com/onlamp/blog/2008/05/dynamic_languages_vs_editors.html
>> - and realised that he's just verbalised something I've only
>> half-thought.
>>
>> When we run our tests (because of course we're using TDD) we know the
>> precise types/expected classes of everything, because the VM
>> automatically collects (or can collect) this information.
>>
>> But how do we get that information out of the VM?
>>
>> frank
>>
>>
>
> You don't need to extract it from the VM, you can have a type profiler that collects it for you in the image.

Doesn't that just mean twice as much work? The VM of necessity has already typed the call sites (even if the typing is only eventually correct). Why could a mirror not expose the typing thus far?

frank

> Florin
>

Florin Mateoc-4

Re: Can we extract type information from the VM?

On 9/15/2013 11:47 AM, Frank Shearar wrote:

> On 15 Sep 2013, at 14:57, Florin Mateoc <[hidden email]> wrote:
>
>> On 9/15/2013 5:54 AM, Frank Shearar wrote:
>>> I was rereading Phlip's "what's wrong with our IDEs" post -
>>> http://www.oreillynet.com/onlamp/blog/2008/05/dynamic_languages_vs_editors.html
>>> - and realised that he's just verbalised something I've only
>>> half-thought.
>>>
>>> When we run our tests (because of course we're using TDD) we know the
>>> precise types/expected classes of everything, because the VM
>>> automatically collects (or can collect) this information.
>>>
>>> But how do we get that information out of the VM?
>>>
>>> frank
>>>
>>>
>> You don't need to extract it from the VM, you can have a type profiler that collects it for you in the image.
> Doesn't that just mean twice as much work? The VM of necessity has already typed the call sites (even if the typing is only eventually correct). Why could a mirror not expose the typing thus far?
>
> frank
>
>> Florin
>>
>

Doing it in the image means you do it in Smalltalk. Extracting it from the VM means you are doing it in C/assembly.
And I definitely do not understand the argument with twice as much work. Work for whom? For the computer? Well, that's
its job. As the developer, you only do it once, regardless which option you chose. I prefer doing it in Smalltalk

Frank Shearar-3

Re: Can we extract type information from the VM?

On 15 September 2013 17:38, Florin Mateoc <[hidden email]> wrote:

> On 9/15/2013 11:47 AM, Frank Shearar wrote:
>> On 15 Sep 2013, at 14:57, Florin Mateoc <[hidden email]> wrote:
>>
>>> On 9/15/2013 5:54 AM, Frank Shearar wrote:
>>>> I was rereading Phlip's "what's wrong with our IDEs" post -
>>>> http://www.oreillynet.com/onlamp/blog/2008/05/dynamic_languages_vs_editors.html
>>>> - and realised that he's just verbalised something I've only
>>>> half-thought.
>>>>
>>>> When we run our tests (because of course we're using TDD) we know the
>>>> precise types/expected classes of everything, because the VM
>>>> automatically collects (or can collect) this information.
>>>>
>>>> But how do we get that information out of the VM?
>>>>
>>>> frank
>>>>
>>>>
>>> You don't need to extract it from the VM, you can have a type profiler that collects it for you in the image.
>> Doesn't that just mean twice as much work? The VM of necessity has already typed the call sites (even if the typing is only eventually correct). Why could a mirror not expose the typing thus far?
>>
>> frank
>>
>>> Florin
>>>
>>
>
>
> Doing it in the image means you do it in Smalltalk. Extracting it from the VM means you are doing it in C/assembly.
> And I definitely do not understand the argument with twice as much work. Work for whom? For the computer? Well, that's
> its job. As the developer, you only do it once, regardless which option you chose. I prefer doing it in Smalltalk

Well, someone has to write the code to collect and extract the information.

Unless I've completely misunderstood you, you're saying I should build
an interpreter within which to run my tests, and that collects this
type information. I'm saying that the VM has to do this _already_ and
exposing this information to the image (through a mirror or similar)
means that (a) you get accurate type information and (b) you don't
have to write an interpreter.

How would a type profiler collect information at least as accurately
as the VM already does?

frank

Bob Arning-2

Re: Can we extract type information from the VM?

I'm not clear on what you are suggesting. Dispatching a message does require the VM knowing the class of the receiver, but how and where the VM might collect that information is not clear. Perhaps an example would help.

Cheers,
Bob

On 9/15/13 1:06 PM, Frank Shearar wrote:

On 15 September 2013 17:38, Florin Mateoc [hidden email] wrote:

On 9/15/2013 11:47 AM, Frank Shearar wrote:

On 15 Sep 2013, at 14:57, Florin Mateoc [hidden email] wrote:

On 9/15/2013 5:54 AM, Frank Shearar wrote:

I was rereading Phlip's "what's wrong with our IDEs" post -
http://www.oreillynet.com/onlamp/blog/2008/05/dynamic_languages_vs_editors.html
- and realised that he's just verbalised something I've only
half-thought.

When we run our tests (because of course we're using TDD) we know the
precise types/expected classes of everything, because the VM
automatically collects (or can collect) this information.

But how do we get that information out of the VM?

frank

You don't need to extract it from the VM, you can have a type profiler that collects it for you in the image.

Doesn't that just mean twice as much work? The VM of necessity has already typed the call sites (even if the typing is only eventually correct). Why could a mirror not expose the typing thus far?

frank

Florin


Doing it in the image means you do it in Smalltalk. Extracting it from the VM means you are doing it in C/assembly.
And I definitely do not understand the argument with twice as much work. Work for whom? For the computer? Well, that's
its job. As the developer, you only do it once, regardless which option you chose. I prefer doing it in Smalltalk

Well, someone  has to write the code to collect and extract the information.

Unless I've completely misunderstood you, you're saying I should build
an interpreter within which to run my tests, and that collects this
type information. I'm saying that the VM has to do this _already_ and
exposing this information to the image (through a mirror or similar)
means that (a) you get accurate type information and (b) you don't
have to write an interpreter.

How would a type profiler collect information at least as accurately
as the VM already does?

frank

Florin Mateoc-4

Re: Can we extract type information from the VM?

In reply to this post by Frank Shearar-3

On 9/15/2013 1:06 PM, Frank Shearar wrote:

> On 15 September 2013 17:38, Florin Mateoc <[hidden email]> wrote:
>> On 9/15/2013 11:47 AM, Frank Shearar wrote:
>>> On 15 Sep 2013, at 14:57, Florin Mateoc <[hidden email]> wrote:
>>>
>>>> On 9/15/2013 5:54 AM, Frank Shearar wrote:
>>>>> I was rereading Phlip's "what's wrong with our IDEs" post -
>>>>> http://www.oreillynet.com/onlamp/blog/2008/05/dynamic_languages_vs_editors.html
>>>>> - and realised that he's just verbalised something I've only
>>>>> half-thought.
>>>>>
>>>>> When we run our tests (because of course we're using TDD) we know the
>>>>> precise types/expected classes of everything, because the VM
>>>>> automatically collects (or can collect) this information.
>>>>>
>>>>> But how do we get that information out of the VM?
>>>>>
>>>>> frank
>>>>>
>>>>>
>>>> You don't need to extract it from the VM, you can have a type profiler that collects it for you in the image.
>>> Doesn't that just mean twice as much work? The VM of necessity has already typed the call sites (even if the typing is only eventually correct). Why could a mirror not expose the typing thus far?
>>>
>>> frank
>>>
>>>> Florin
>>>>
>>
>> Doing it in the image means you do it in Smalltalk. Extracting it from the VM means you are doing it in C/assembly.
>> And I definitely do not understand the argument with twice as much work. Work for whom? For the computer? Well, that's
>> its job. As the developer, you only do it once, regardless which option you chose. I prefer doing it in Smalltalk
> Well, someone has to write the code to collect and extract the information.
>
> Unless I've completely misunderstood you, you're saying I should build
> an interpreter within which to run my tests, and that collects this
> type information. I'm saying that the VM has to do this _already_ and
> exposing this information to the image (through a mirror or similar)
> means that (a) you get accurate type information and (b) you don't
> have to write an interpreter.
>
> How would a type profiler collect information at least as accurately
> as the VM already does?
>
> frank
>
>

No, I did not mean an interpreter, but using the existing sampling profiler infrastructure, to which an additional kind
of profiler can easily be added (in addition to timing and allocation profilers). I did such an exercise in VW.
As for accuracy, how can a profiler collect less accurate type information for the same run of the same code? (Well, of
course, you have to take care of proxies and such). Worst case, for short-running methods and if you don't want to wait
for multiple runs (therefore the type information could be incomplete), you can use method wrappers and do an exact
collection.

Florin

Eliot Miranda-2

Re: Can we extract type information from the VM?

In reply to this post by Frank Shearar-3

On Sun, Sep 15, 2013 at 2:54 AM, Frank Shearar <[hidden email]> wrote:

I was rereading Phlip's "what's wrong with our IDEs" post -
http://www.oreillynet.com/onlamp/blog/2008/05/dynamic_languages_vs_editors.html
- and realised that he's just verbalised something I've only
half-thought.

When we run our tests (because of course we're using TDD) we know the
precise types/expected classes of everything, because the VM
automatically collects (or can collect) this information.

But how do we get that information out of the VM?

Part of my Sista design (Speculative Inlining Smalltalk Architecture) is access to the inline caches through a primitive on CompiledMethod. But that primitive is only available in an experimental VM. I could add it to the standard VM though. Let me know if you've energy enough to explore this.

frank

--
best,

Eliot

Eliot Miranda-2

Re: Can we extract type information from the VM?

In reply to this post by Bob Arning-2

On Sun, Sep 15, 2013 at 10:17 AM, Bob Arning <[hidden email]> wrote:

I'm not clear on what you are suggesting. Dispatching a message does require the VM knowing the class of the receiver, but how and where the VM might collect that information is not clear. Perhaps an example would help.

The JIT uses inline caches at send sites to optimize sends. These tell you

- whether a send has been executed; if a send has never been executed the send site will be unlinked with no cache data.

- whether the send has been sent to a single class of receiver, and what that class is; if so, a send site will be linked to a method and have one class entry in the inline cache.

- whether the send has been sent to a small number of classes of receiver (in Cog up to 6), and what these are; if so the send site will be linked to a "closed" Polymorphic Inline Cache with up to 6 class entries.

- whether the send has been sent to more than 6 classes; if so the site will be linked to an "open" polymorphic inline cache, which is a first-level method lookup cache probe with no classes cached.

So the VM, in optimizing sends, collects type data on send sites, untaken, monomorphic, polymorphic or megamorphic. This is the bases of adaptive optimization in VMs such as HotSpot and V8. After Spur, this is the next target for Cog.

See e.g. build me a jit for gory details.

Cheers,
Bob

HTH

On 9/15/13 1:06 PM, Frank Shearar wrote:

On 15 September 2013 17:38, Florin Mateoc [hidden email] wrote:

On 9/15/2013 11:47 AM, Frank Shearar wrote:

On 15 Sep 2013, at 14:57, Florin Mateoc [hidden email] wrote:

On 9/15/2013 5:54 AM, Frank Shearar wrote:

I was rereading Phlip's "what's wrong with our IDEs" post -
http://www.oreillynet.com/onlamp/blog/2008/05/dynamic_languages_vs_editors.html
- and realised that he's just verbalised something I've only
half-thought.

When we run our tests (because of course we're using TDD) we know the
precise types/expected classes of everything, because the VM
automatically collects (or can collect) this information.

But how do we get that information out of the VM?

frank

You don't need to extract it from the VM, you can have a type profiler that collects it for you in the image.

Doesn't that just mean twice as much work? The VM of necessity has already typed the call sites (even if the typing is only eventually correct). Why could a mirror not expose the typing thus far?

frank

Florin

Doing it in the image means you do it in Smalltalk. Extracting it from the VM means you are doing it in C/assembly.
And I definitely do not understand the argument with twice as much work. Work for whom? For the computer? Well, that's
its job. As the developer, you only do it once, regardless which option you chose. I prefer doing it in Smalltalk

Well, someone  has to write the code to collect and extract the information.

Unless I've completely misunderstood you, you're saying I should build
an interpreter within which to run my tests, and that collects this
type information. I'm saying that the VM has to do this _already_ and
exposing this information to the image (through a mirror or similar)
means that (a) you get accurate type information and (b) you don't
have to write an interpreter.

How would a type profiler collect information at least as accurately
as the VM already does?

frank

--
best,

Eliot

Michael Perscheid

Re: Can we extract type information from the VM?

Hi all,

after my PhD is done, we will publish its results (the Path tools framework) to the Squeak

community in the near future. Among others, we have also implemented a "type harvester"

that collects type information from running (passing) test cases. This information will be

presented within a browser extension (label, see screenshot).

For more information have a look at the following papers:

Type Harvester: http://michaelperscheid.de/publications/papers/HauptPerscheidHirschfeld_2011_TypeHarvestingAPracticalApproachToObtainingTypingInformationInDynamicProgrammingLanguages_AcmDL.pdf

Path Tools: http://michaelperscheid.de/publications/papers/PerscheidHauptHirschfeldMasuhara_2012b_TestDrivenFaultNavigationForDebuggingReproducibleFailures_JSSST.pdf

Stay tuned :-)

Best,

Michael

On 15.09.2013, at 21:21, Eliot Miranda <[hidden email]> wrote:

On Sun, Sep 15, 2013 at 10:17 AM, Bob Arning <[hidden email]> wrote:
I'm not clear on what you are suggesting. Dispatching a message does require the VM knowing the class of the receiver, but how and where the VM might collect that information is not clear. Perhaps an example would help.

The JIT uses inline caches at send sites to optimize sends. These tell you
- whether a send has been executed; if a send has never been executed the send site will be unlinked with no cache data.
- whether the send has been sent to a single class of receiver, and what that class is; if so, a send site will be linked to a method and have one class entry in the inline cache.
- whether the send has been sent to a small number of classes of receiver (in Cog up to 6), and what these are; if so the send site will be linked to a "closed" Polymorphic Inline Cache with up to 6 class entries.
- whether the send has been sent to more than 6 classes; if so the site will be linked to an "open" polymorphic inline cache, which is a first-level method lookup cache probe with no classes cached.

So the VM, in optimizing sends, collects type data on send sites, untaken, monomorphic, polymorphic or megamorphic. This is the bases of adaptive optimization in VMs such as HotSpot and V8. After Spur, this is the next target for Cog.

See e.g. build me a jit for gory details.

Cheers,
Bob

HTH

On 9/15/13 1:06 PM, Frank Shearar wrote:
On 15 September 2013 17:38, Florin Mateoc <[hidden email]>
wrote:

On 9/15/2013 11:47 AM, Frank Shearar wrote:

On 15 Sep 2013, at 14:57, Florin Mateoc <[hidden email]>
wrote:

On 9/15/2013 5:54 AM, Frank Shearar wrote:

I was rereading Phlip's "what's wrong with our IDEs" post -

http://www.oreillynet.com/onlamp/blog/2008/05/dynamic_languages_vs_editors.html

- and realised that he's just verbalised something I've only
half-thought.

When we run our tests (because of course we're using TDD) we know the
precise types/expected classes of everything, because the VM
automatically collects (or can collect) this information.

But how do we get that information out of the VM?

frank

You don't need to extract it from the VM, you can have a type profiler that collects it for you in the image.

Doesn't that just mean twice as much work? The VM of necessity has already typed the call sites (even if the typing is only eventually correct). Why could a mirror not expose the typing thus far?

frank

Florin

Doing it in the image means you do it in Smalltalk. Extracting it from the VM means you are doing it in C/assembly.
And I definitely do not understand the argument with twice as much work. Work for whom? For the computer? Well, that's
its job. As the developer, you only do it once, regardless which option you chose. I prefer doing it in Smalltalk

Well, someone has to write the code to collect and extract the information.

Unless I've completely misunderstood you, you're saying I should build
an interpreter within which to run my tests, and that collects this
type information. I'm saying that the VM has to do this _already_ and
exposing this information to the image (through a mirror or similar)
means that (a) you get accurate type information and (b) you don't
have to write an interpreter.

How would a type profiler collect information at least as accurately
as the VM already does?

frank

--
best,
Eliot

---
Michael Perscheid
[hidden email]

http://www.michaelperscheid.de/