String concatenation -> Object copying implementation: indexed and instance variables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

String concatenation -> Object copying implementation: indexed and instance variables

Les Hazlewood
I thought I'd give a shot at implementing the String concatenation message ',', so I went digging through Pharo's code for examples.  Implementations eventually delegated to the Object class's copy implementation, which makes sense: when you concatenate, you combine a copy of the receiver (you don't modify the receiver) with the argument.  The Object class's copy implementation internally delegates to selfCopy.  

The following Pharo implementation assumes the existence of indexed variables (code in blue) and instance variables (code in green):

| class newObject index |
<primitive: 148>
class := self class.
class isVariable
    ifTrue:
        [index := self basicSize.
        newObject := class basicNew: index.
        [index > 0]
            whileTrue:
                [newObject basicAt: index put: (self basicAt: index).
                index := index - 1]]

  ifFalse: [newObject := class basicNew].
index := class instSize.
[index > 0]
    whileTrue:
        [newObject instVarAt: index put: (self instVarAt: index).
        index := index - 1].

^ newObject


This leads to my questions:

In the blue code above, Redline's Behavior class does have a basicSize message.
However, Behavior (or Class) does not have a basicNew: index writable message (which has a <primitive: 71>, which Redline apparently does not have yet).

In the green code, the object can be inspected for its instance variables, but Redline can't do this at the moment.

James or the other Redline dev team members, can you please shed some light as to what is going on here and how we might implement shallowCopy within Redline?  What kind of support might be expected for indexed vs instance variable introspection?

Thanks!

Les
Reply | Threaded
Open this post in threaded view
|

Re: String concatenation -> Object copying implementation: indexed and instance variables

Les Hazlewood

The Object class's copy implementation internally delegates to selfCopy.  

Sorry, I meant shallowCopy.
Reply | Threaded
Open this post in threaded view
|

Re: String concatenation -> Object copying implementation: indexed and instance variables

James Ladd
Hi Les,

Thank you for looking into the implementation of ','.

The Pharo source calls <primitive: 148>, thus the implementation
in Redline for ',' should be in Java as method p148()

You will have trouble with implementing the primitive right now.
There are some big changes I have yet to commit which will fix
this and other issues - in one big bang. Thats the theory (pun intended).

Please try to implement some other functionality that may not rely on a
primitive.

I'm sorry to slow you down as having people work on the Smalltalk classes is
crucial. However, the changes are a direct result of the early experiences of
people doing these classes. I'm spending the next 7 days on nothing but
Redline!

- James.

On Sun, Dec 30, 2012 at 7:54 AM, Les Hazlewood <[hidden email]> wrote:

The Object class's copy implementation internally delegates to selfCopy.  

Sorry, I meant shallowCopy.

Reply | Threaded
Open this post in threaded view
|

Re: String concatenation -> Object copying implementation: indexed and instance variables

Les Hazlewood
No worries James, I'm glad you'll be able to address these things soon enough.

But I guess my last email was more about the concept of indexed vs instance variables - how are (or will) these be addressed?

Finally, what exactly does <primitive: 148> and <primitive: 71> do?

You see <primitive: 148> in the Object shallowCopy implementation I pasted, but I don't know what it does.   I'm assuming the number indicates a particular VM-level instruction.  Are the numbers standard?  Or are they arbitrarily chosen per Smalltalk implementation?

Thanks for your continued help and explanations!

Les
Reply | Threaded
Open this post in threaded view
|

Re: String concatenation -> Object copying implementation: indexed and instance variables

James Ladd
Response inline ... keep the questions coming. I need this knowledge spread :)
- James.

On Sun, Dec 30, 2012 at 9:46 AM, Les Hazlewood <[hidden email]> wrote:
No worries James, I'm glad you'll be able to address these things soon enough. 

I hope after 7 days to have jumped Redline forward significantly.
 
But I guess my last email was more about the concept of indexed vs instance variables - how are (or will) these be addressed?

Each object can have a corresponding Java value. This value will be used to represent
indexed variables as appropriate. For example, A Smalltalk Array may be backed by
a Java List. At some place in the hierarchy there will need to be primitives specific to
Redline that make special use of the Java value.

In the next update (which I am working on) instance variables will be held in a Java
Map internal to each Object (Not in the JavaValue field).
 

Finally, what exactly does <primitive: 148> and <primitive: 71> do?
 
Primitive 148 = Not sure, it isn't in the Blue Book (see below)
Are you sure this is the number as I can't find it in the sources?

Primitive 70/71 create a new instance of a class. This will include initializing instance
variables and assigning a class instance.
 
You see <primitive: 148> in the Object shallowCopy implementation I pasted, but I don't know what it does.   I'm assuming the number indicates a particular VM-level instruction.  Are the numbers standard?  Or are they arbitrarily chosen per Smalltalk implementation?

Page 613 of Smalltalk-80 The Language and its implementation (Blue Book)
there is a formal definition of the Primitive Methods. This lists 127 primitives.
Redline is copying the primitives where it makes sense. For example, primitive 70
is Basic New and we will copy this, however, primitive 96 relates to bitblt so we wont
copy that.

Generally we will make new specific to Redline primitives start at 130.

I think different Smalltalks have kept to the Blue Book definition but added their own where gaps exists.


Thanks for your continued help and explanations!

Thank you!

Les

Reply | Threaded
Open this post in threaded view
|

Re: String concatenation -> Object copying implementation: indexed and instance variables

Les Hazlewood
 
But I guess my last email was more about the concept of indexed vs instance variables - how are (or will) these be addressed?

Each object can have a corresponding Java value. This value will be used to represent
indexed variables as appropriate. For example, A Smalltalk Array may be backed by
a Java List. At some place in the hierarchy there will need to be primitives specific to
Redline that make special use of the Java value.

I think I understand, but I could be way off - could you please address the following?

Here's my current (Java-ish) mental model of what I think Smalltalk instance vs. indexed variables might be:

When I think of instance variables, I currently think of them (e.g. in Java land) as the build-in named instance attributes, e.g.

public class Person {
    private String name;
    private Date birthDate;
    ...
}

Here I assume that name and birthDate are parallel to Smalltalk instance variables and they're fundamentally built in to the class definition and probably couldn't be modified (but who knows - this is Smalltalk after all and you probably could modify anything).

I would then think of indexed variables as something that can be added or removed dynamically at runtime, probably backed internally by a Map.  e.g.:

public class Person {
    ...
    Map<String,Object> indexedVariables;

    public Object get(String name) {
        return indexedVariables.get(name);
    }
    
    public void set(String name, Object value) {
        indexedVariables.set(name, value);
    }
}
 
And then I could do something like person.set("taxId", "123-45-6789"); and person.set("gender", "male");

Here, taxId and gender might be indexed attributes: they are indexed by their name and the backing indexedVariables collection can grow/shrink at runtime as desired.

It appears that the Smalltalk Object class (at least in Pharo) supports both concepts as implied by the shallowCopy implementation (unless my mental model is entirely wrong, which is totally possible).

Is this mental model/representation correct?  Or am I way off base?  When defining a class, when would you use either approach (i.e. why do both exist and not just one?)

In the next update (which I am working on) instance variables will be held in a Java
Map internal to each Object (Not in the JavaValue field).

Instance variables?  Or indexed variables?
 

Finally, what exactly does <primitive: 148> and <primitive: 71> do?
 
Primitive 148 = Not sure, it isn't in the Blue Book (see below)
Are you sure this is the number as I can't find it in the sources?

I'm currently looking at Pharo 1.3's source - Object class, shallowCopy instance method, line 5: <primitive: 148>.  It must be Pharo/Squeak specific then.
 
Primitive 70/71 create a new instance of a class. This will include initializing instance
variables and assigning a class instance.

I'm learning here, but why is this done as a <primitive>?  Isn't that the purpose of the class and instance initialize methods?  (I'm going to read the Blue Book on primitives, so if the answer is lengthy, don't worry about it ;)).
 
 
You see <primitive: 148> in the Object shallowCopy implementation I pasted, but I don't know what it does.   I'm assuming the number indicates a particular VM-level instruction.  Are the numbers standard?  Or are they arbitrarily chosen per Smalltalk implementation?

Page 613 of Smalltalk-80 The Language and its implementation (Blue Book)
there is a formal definition of the Primitive Methods. This lists 127 primitives.
Redline is copying the primitives where it makes sense. For example, primitive 70
is Basic New and we will copy this, however, primitive 96 relates to bitblt so we wont
copy that.

Fantastic - thanks so much for the reference!  I'll read this asap.

Regards,

Les
Reply | Threaded
Open this post in threaded view
|

Re: String concatenation -> Object copying implementation: indexed and instance variables

James Ladd
More good questions - thank you :)
Addressed inline below ....
- James.

On Sun, Dec 30, 2012 at 3:59 PM, Les Hazlewood <[hidden email]> wrote:
 
But I guess my last email was more about the concept of indexed vs instance variables - how are (or will) these be addressed?

Each object can have a corresponding Java value. This value will be used to represent
indexed variables as appropriate. For example, A Smalltalk Array may be backed by
a Java List. At some place in the hierarchy there will need to be primitives specific to
Redline that make special use of the Java value.

I think I understand, but I could be way off - could you please address the following?

Here's my current (Java-ish) mental model of what I think Smalltalk instance vs. indexed variables might be:

When I think of instance variables, I currently think of them (e.g. in Java land) as the build-in named instance attributes, e.g.

public class Person {
    private String name;
    private Date birthDate;
    ...
}

Here I assume that name and birthDate are parallel to Smalltalk instance variables and they're fundamentally built in to the class definition and probably couldn't be modified (but who knows - this is Smalltalk after all and you probably could modify anything).

I would then think of indexed variables as something that can be added or removed dynamically at runtime, probably backed internally by a Map.  e.g.:

public class Person {
    ...
    Map<String,Object> indexedVariables;

    public Object get(String name) {
        return indexedVariables.get(name);
    }
    
    public void set(String name, Object value) {
        indexedVariables.set(name, value);
    }
}
 
And then I could do something like person.set("taxId", "123-45-6789"); and person.set("gender", "male");

Here, taxId and gender might be indexed attributes: they are indexed by their name and the backing indexedVariables collection can grow/shrink at runtime as desired.

It appears that the Smalltalk Object class (at least in Pharo) supports both concepts as implied by the shallowCopy implementation (unless my mental model is entirely wrong, which is totally possible).

Is this mental model/representation correct?  Or am I way off base?  When defining a class, when would you use either approach (i.e. why do both exist and not just one?)

In the next update (which I am working on) instance variables will be held in a Java
Map internal to each Object (Not in the JavaValue field).

Instance variables?  Or indexed variables?
 
Ok -

In Redline every object is an instance (based on the changed I'm making) of a class that
looks like this in Java:

public class MetaObject {
  public MetaObject selfclass;
  public MetaObject superclass;
  public Map<String, MetaObject> variables;
  public Object javaValue;
 }

When you say "subclass: #MyThing instanceVariableNames: 'foo bar' ...
the names 'foo' and 'bar' will be added as instance variables in the 'variables' map.
When the Class contains an 'indexed' variable then the 'javaValue' will contain
the appropriate Java value to hold the indexed elements.
Most likely ArrayList<MetaObject>

In you example above "taxId" would be a key in the variables map. And
a taxId := '123-45-678' would set it.
Basic at: and basic at:put: will be overridden in subclasses to use either
the variables map, or javaValue.


Finally, what exactly does <primitive: 148> and <primitive: 71> do?
 
Primitive 148 = Not sure, it isn't in the Blue Book (see below)
Are you sure this is the number as I can't find it in the sources?

I'm currently looking at Pharo 1.3's source - Object class, shallowCopy instance method, line 5: <primitive: 148>.  It must be Pharo/Squeak specific then.

Yes - I think they are Pharo/Squeak specific.
 
Primitive 70/71 create a new instance of a class. This will include initializing instance
variables and assigning a class instance.

I'm learning here, but why is this done as a <primitive>?  Isn't that the purpose of the class and instance initialize methods?  (I'm going to read the Blue Book on primitives, so if the answer is lengthy, don't worry about it ;)).

The idea in the Pharo implementation is that should a primitive fail, the following code
can be executed instead to provide the implementation. In Redline we are assuming
the primitive "wont" fail. We are also assuming the primitive will be faster.
Hence we don't follow our primitives with a possible implementation in Smalltalk.
 
 
 
You see <primitive: 148> in the Object shallowCopy implementation I pasted, but I don't know what it does.   I'm assuming the number indicates a particular VM-level instruction.  Are the numbers standard?  Or are they arbitrarily chosen per Smalltalk implementation?

Page 613 of Smalltalk-80 The Language and its implementation (Blue Book)
there is a formal definition of the Primitive Methods. This lists 127 primitives.
Redline is copying the primitives where it makes sense. For example, primitive 70
is Basic New and we will copy this, however, primitive 96 relates to bitblt so we wont
copy that.

Fantastic - thanks so much for the reference!  I'll read this asap.

The Blue Book is what I am following for the 'purity' of Smalltalk, however, we
then look at Pharo to get the reality of implementation and to ensure that we
are Pharo compatible as we implement the base methods defined by the
Blue Book.
 

Regards,

Les

Reply | Threaded
Open this post in threaded view
|

Re: String concatenation -> Object copying implementation: indexed and instance variables

JONNALAGADDA Srinivas
        I have a query, James, on the choice of putting a map in every object!

On Sunday, 30 December 2012 12:21:30 UTC+5:30, jamesl wrote:
In Redline every object is an instance (based on the changed I'm making) of a class that
looks like this in Java:

public class MetaObject {
  public MetaObject selfclass;
  public MetaObject superclass;
  public Map<String, MetaObject> variables;
  public Object javaValue;
 }

        Have you run into any particular problems with the current approach where the class answers the index into the attributes array of the object, when given the name of the attribute?  What particular advantages do you foresee with each object having a map instead?

        I ask this because putting a map in every object has two consequences:
(a) it increases the per-object allocation cost, potentially leading to more frequent GCs, and
(b) it forces a (hash operation + lookup) for every attribute access.

        In the current scenario, the JVM may be able to cache the answers from repeated invocations of PrimObjectClass#indexOfVariable (memoization).  The actual attribute access, evidently, does not require a hash operation, but only an offset-based retrieval.

        So, I am trying to understand if it is a decision that you have made based on design considerations or empirical data on performance or ... ?  Thanks.

-- |0|0|
Reply | Threaded
Open this post in threaded view
|

Re: String concatenation -> Object copying implementation: indexed and instance variables

James Ladd
Using a map for instance variables has the following pros:

1. Easy to implement - no need to track what index the next variable
    should get, which has to take the superclass into account.
2. Current lookup involves a Map to find out the index, why not just
    map to value.  We need to lookup the variable at runtime, not
    compile time.
3. Variables are dynamic and a map is dynamic whereas an array is not as
     dynamic - it becomes problematic to grow/shrink and map indexes.
4. Classes become more simple since they already contain a map
    and that map will be used for one purpose, not two. ie: tracking names
    and values.
5. We can swap to an alternative when we have more use cases and
     empirical data.  Working correctly is more important right now.
6. A simplified Java implementation should make it possible for others to
    contribute to the Java side of Redline. Currently this is my domain 99.99%
    of the time.

The con's of using a map are:

7. We have to create the map on first put.
8. We have to check map is created before get.
9. The above #6 & #7 are done to keep space overhead down.
10. Variable lookup may be slow. Might be able to overcome this with
    some on the fly method generation of accessors later.
11. Maps take up space for the key, as well as the value.

- James.

On Sun, Dec 30, 2012 at 8:43 PM, JONNALAGADDA Srinivas <[hidden email]> wrote:
        I have a query, James, on the choice of putting a map in every object!


On Sunday, 30 December 2012 12:21:30 UTC+5:30, jamesl wrote:
In Redline every object is an instance (based on the changed I'm making) of a class that
looks like this in Java:

public class MetaObject {
  public MetaObject selfclass;
  public MetaObject superclass;
  public Map<String, MetaObject> variables;
  public Object javaValue;
 }

        Have you run into any particular problems with the current approach where the class answers the index into the attributes array of the object, when given the name of the attribute?  What particular advantages do you foresee with each object having a map instead?

        I ask this because putting a map in every object has two consequences:
(a) it increases the per-object allocation cost, potentially leading to more frequent GCs, and
(b) it forces a (hash operation + lookup) for every attribute access.

        In the current scenario, the JVM may be able to cache the answers from repeated invocations of PrimObjectClass#indexOfVariable (memoization).  The actual attribute access, evidently, does not require a hash operation, but only an offset-based retrieval.

        So, I am trying to understand if it is a decision that you have made based on design considerations or empirical data on performance or ... ?  Thanks.

-- |0|0|

Reply | Threaded
Open this post in threaded view
|

Re: String concatenation -> Object copying implementation: indexed and instance variables

James Ladd
javax.lang.model.type.TypeKind

Do you have an example that uses TypeKind?
Can you elaborate on how this could be used?

On Mon, Dec 31, 2012 at 7:55 AM, James Ladd <[hidden email]> wrote:
Using a map for instance variables has the following pros:

1. Easy to implement - no need to track what index the next variable
    should get, which has to take the superclass into account.
2. Current lookup involves a Map to find out the index, why not just
    map to value.  We need to lookup the variable at runtime, not
    compile time.
3. Variables are dynamic and a map is dynamic whereas an array is not as
     dynamic - it becomes problematic to grow/shrink and map indexes.
4. Classes become more simple since they already contain a map
    and that map will be used for one purpose, not two. ie: tracking names
    and values.
5. We can swap to an alternative when we have more use cases and
     empirical data.  Working correctly is more important right now.
6. A simplified Java implementation should make it possible for others to
    contribute to the Java side of Redline. Currently this is my domain 99.99%
    of the time.

The con's of using a map are:

7. We have to create the map on first put.
8. We have to check map is created before get.
9. The above #6 & #7 are done to keep space overhead down.
10. Variable lookup may be slow. Might be able to overcome this with
    some on the fly method generation of accessors later.
11. Maps take up space for the key, as well as the value.

- James.


On Sun, Dec 30, 2012 at 8:43 PM, JONNALAGADDA Srinivas <[hidden email]> wrote:
        I have a query, James, on the choice of putting a map in every object!


On Sunday, 30 December 2012 12:21:30 UTC+5:30, jamesl wrote:
In Redline every object is an instance (based on the changed I'm making) of a class that
looks like this in Java:

public class MetaObject {
  public MetaObject selfclass;
  public MetaObject superclass;
  public Map<String, MetaObject> variables;
  public Object javaValue;
 }

        Have you run into any particular problems with the current approach where the class answers the index into the attributes array of the object, when given the name of the attribute?  What particular advantages do you foresee with each object having a map instead?

        I ask this because putting a map in every object has two consequences:
(a) it increases the per-object allocation cost, potentially leading to more frequent GCs, and
(b) it forces a (hash operation + lookup) for every attribute access.

        In the current scenario, the JVM may be able to cache the answers from repeated invocations of PrimObjectClass#indexOfVariable (memoization).  The actual attribute access, evidently, does not require a hash operation, but only an offset-based retrieval.

        So, I am trying to understand if it is a decision that you have made based on design considerations or empirical data on performance or ... ?  Thanks.

-- |0|0|


Reply | Threaded
Open this post in threaded view
|

Re: String concatenation -> Object copying implementation: indexed and instance variables

Les Hazlewood
In reply to this post by James Ladd
In Redline every object is an instance (based on the changed I'm making) of a class that
looks like this in Java:

public class MetaObject {
  public MetaObject selfclass;
  public MetaObject superclass;
  public Map<String, MetaObject> variables;
  public Object javaValue;
 }

When you say "subclass: #MyThing instanceVariableNames: 'foo bar' ...
the names 'foo' and 'bar' will be added as instance variables in the 'variables' map.
When the Class contains an 'indexed' variable then the 'javaValue' will contain
the appropriate Java value to hold the indexed elements.
Most likely ArrayList<MetaObject>

In you example above "taxId" would be a key in the variables map. And
a taxId := '123-45-678' would set it.
Basic at: and basic at:put: will be overridden in subclasses to use either
the variables map, or javaValue.

This is hugely helpful for me (and I hope others) understanding how Redline works.  It will allow me to better contribute to its core class implementations.  Thanks very much for elaborating!
 
I'm learning here, but why is this done as a <primitive>?  Isn't that the purpose of the class and instance initialize methods?  (I'm going to read the Blue Book on primitives, so if the answer is lengthy, don't worry about it ;)).

The idea in the Pharo implementation is that should a primitive fail, the following code
can be executed instead to provide the implementation. In Redline we are assuming
the primitive "wont" fail. We are also assuming the primitive will be faster.
Hence we don't follow our primitives with a possible implementation in Smalltalk.

Fascinating.  As I haven't yet had the ability to read the Blue Book on primitives, I wasn't aware that the code following a <primitive: aNum> declaration was "execute this code in case the primitive fails".  Thanks for clarifying!

On a side note, I finally purchased a physical copy of the Blue Book (used of course since it is out of print).  I'm happy to add this piece of history to my book shelf as well as to have the convenience of reading it away from a computer PDF viewer.  (I suppose I could have sent the PDF to my Kindle, but I wanted the physical book anyway!).  

If anyone else reading this thread hasn't read the Chapter 1 of the Blue Book [1], it is a clear and succinct resource for understanding OO fundamentals (and not just Smalltalk).  I think it should be read at least once by everyone that programs any form of OO.

Thanks again,

Les