Modifying a string literal

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Modifying a string literal

David Hari
Whilst writing a method recently, i noticed some strange behaviour
regarding the value of a String literal. My method created a blank
string of a fixed length, then each character was filled in based on a
bunch of boolean checks.

Here is an example which demonstrates that behaviour:

TestObject>>testStringLiteral: aBoolean
    | theString |

    theString := '1234'.
    aBoolean ifTrue: [
        theString at: 1 put: $A.
        theString at: 2 put: $B.
        theString at: 3 put: $C.
        theString at: 4 put: $D.
    ].
    ^theString

If the method is called with aBoolean as false, the string returned is
unchanged. But if aBoolean is true, the string is modified and any
further calls to #testStringLiteral: will return the modified string,
even if aBoolean is false.
In other words, the string is permanently set as 'ABCD' even though it
is always initialized to '1234' at the start of the method.

So, why is this happening?

Well, after doing some debugging and investigation, i found that it is
because a reference to the literal is stored in the CompiledMethod and
although the referent of a literal cannot change, this literal is a
String object, and therefore any of it's members *can* be changed (via
#at:put:).
Inspect the CompiledMethod returned from "TestObject methodDictionary
at: #testStringLiteral:" and you will see that the first literal is the
String object. Initially, this object is created as '1234'. But after
the code is ran (with aBoolean as true), each character of the object is
changed so the object is now 'ABCD'. The next time the code is ran, the
local variable is initialized to this object which has been modified.

This behaviour is unexpected and different from other languages. Is
anyone else aware of this strange behaviour?

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

James Foster-5
Hi David,

While somewhat unexpected, this behavior is a side-effect of reusing the literal (as you discovered). Some people actually take advantage of this behavior by creating and caching complex objects in a method literal. I prefer Smalltalk dialects that mark method literals as immutable so that you don't stumble into this confusion.

Of course, the way to get what you want is to send #'copy' to the method literal before storing it in 'theString' method temporary. Then you get a new instance each time and can modify it without side-effects.

James

On Nov 2, 2010, at 8:42 PM, David Hari wrote:

> Whilst writing a method recently, i noticed some strange behaviour
> regarding the value of a String literal. My method created a blank
> string of a fixed length, then each character was filled in based on a
> bunch of boolean checks.
>
> Here is an example which demonstrates that behaviour:
>
> TestObject>>testStringLiteral: aBoolean
>    | theString |
>
>    theString := '1234'.
>    aBoolean ifTrue: [
>        theString at: 1 put: $A.
>        theString at: 2 put: $B.
>        theString at: 3 put: $C.
>        theString at: 4 put: $D.
>    ].
>    ^theString
>
> If the method is called with aBoolean as false, the string returned is
> unchanged. But if aBoolean is true, the string is modified and any
> further calls to #testStringLiteral: will return the modified string,
> even if aBoolean is false.
> In other words, the string is permanently set as 'ABCD' even though it
> is always initialized to '1234' at the start of the method.
>
> So, why is this happening?
>
> Well, after doing some debugging and investigation, i found that it is
> because a reference to the literal is stored in the CompiledMethod and
> although the referent of a literal cannot change, this literal is a
> String object, and therefore any of it's members *can* be changed (via
> #at:put:).
> Inspect the CompiledMethod returned from "TestObject methodDictionary
> at: #testStringLiteral:" and you will see that the first literal is the
> String object. Initially, this object is created as '1234'. But after
> the code is ran (with aBoolean as true), each character of the object is
> changed so the object is now 'ABCD'. The next time the code is ran, the
> local variable is initialized to this object which has been modified.
>
> This behaviour is unexpected and different from other languages. Is
> anyone else aware of this strange behaviour?
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***
>

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

David Hari
> Some people actually take advantage of this behavior by creating and
caching complex objects in a method literal.
I'm not sure what you mean. Are talking about something like this:

SomeObject>>getCachedObject
    | temp |
    temp := ComplexObject new; ...; yourself.
    ^temp

Because this does as expected and returns a new object at each
invocation. Because the object itself is not stored in the
CompiledMethod, it only stores the byte codes required to create the
object.

Besides, I would prefer to cache objects by explicitly storing them in
an instance variable or property. It just makes the code more readable.


> I prefer Smalltalk dialects that mark method literals as immutable so
that you don't stumble into this confusion.
Yeah, that would be a better way of handling literals. I wonder why
Visual Smalltalk does not do this?



-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise
[mailto:[hidden email]] On Behalf Of James Foster
Sent: Wednesday, 3 November 2010 3:13 PM
To: [hidden email]
Subject: Re: Modifying a string literal

Hi David,

While somewhat unexpected, this behavior is a side-effect of reusing the
literal (as you discovered). Some people actually take advantage of this
behavior by creating and caching complex objects in a method literal. I
prefer Smalltalk dialects that mark method literals as immutable so that
you don't stumble into this confusion.

Of course, the way to get what you want is to send #'copy' to the method
literal before storing it in 'theString' method temporary. Then you get
a new instance each time and can modify it without side-effects.

James

On Nov 2, 2010, at 8:42 PM, David Hari wrote:

> Whilst writing a method recently, i noticed some strange behaviour
> regarding the value of a String literal. My method created a blank
> string of a fixed length, then each character was filled in based on a
> bunch of boolean checks.
>
> Here is an example which demonstrates that behaviour:
>
> TestObject>>testStringLiteral: aBoolean
>    | theString |
>
>    theString := '1234'.
>    aBoolean ifTrue: [
>        theString at: 1 put: $A.
>        theString at: 2 put: $B.
>        theString at: 3 put: $C.
>        theString at: 4 put: $D.
>    ].
>    ^theString
>
> If the method is called with aBoolean as false, the string returned is
> unchanged. But if aBoolean is true, the string is modified and any
> further calls to #testStringLiteral: will return the modified string,
> even if aBoolean is false.
> In other words, the string is permanently set as 'ABCD' even though it
> is always initialized to '1234' at the start of the method.
>
> So, why is this happening?
>
> Well, after doing some debugging and investigation, i found that it is
> because a reference to the literal is stored in the CompiledMethod and
> although the referent of a literal cannot change, this literal is a
> String object, and therefore any of it's members *can* be changed (via
> #at:put:).
> Inspect the CompiledMethod returned from "TestObject methodDictionary
> at: #testStringLiteral:" and you will see that the first literal is
the
> String object. Initially, this object is created as '1234'. But after
> the code is ran (with aBoolean as true), each character of the object
is
> changed so the object is now 'ABCD'. The next time the code is ran,
the
> local variable is initialized to this object which has been modified.
>
> This behaviour is unexpected and different from other languages. Is
> anyone else aware of this strange behaviour?
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***
>

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

Javier Burroni
Hi David,

On Wed, Nov 3, 2010 at 2:07 AM, David Hari <[hidden email]> wrote:
>> Some people actually take advantage of this behavior by creating and
> caching complex objects in a method literal.
> I'm not sure what you mean. Are talking about something like this:
>
> SomeObject>>getCachedObject
>    | temp |
>    temp := ComplexObject new; ...; yourself.
>    ^temp

I think James reefers to something like this:
memoizing
        | cache |
        cache := #(nil).
        cache first isNil ifTrue: [cache at: 1 put: 1000 factorial]
ifFalse:[self halt].
        ^cache first

I saw horrible people doing this in front of kids...

Actually, I like the idea of handling the compiledMethod as an object
from the compiledMethod itself (which is done implicitly in this
example). I would like to have some pseudovariable like thisMethod.
What do people think about that?

>
> Because this does as expected and returns a new object at each
> invocation. Because the object itself is not stored in the
> CompiledMethod, it only stores the byte codes required to create the
> object.
>
> Besides, I would prefer to cache objects by explicitly storing them in
> an instance variable or property. It just makes the code more readable.
>
>
>> I prefer Smalltalk dialects that mark method literals as immutable so
> that you don't stumble into this confusion.
> Yeah, that would be a better way of handling literals. I wonder why
> Visual Smalltalk does not do this?
>
>
>
> -----Original Message-----
> From: Using Visual Smalltalk for Windows/Enterprise
> [mailto:[hidden email]] On Behalf Of James Foster
> Sent: Wednesday, 3 November 2010 3:13 PM
> To: [hidden email]
> Subject: Re: Modifying a string literal
>
> Hi David,
>
> While somewhat unexpected, this behavior is a side-effect of reusing the
> literal (as you discovered). Some people actually take advantage of this
> behavior by creating and caching complex objects in a method literal. I
> prefer Smalltalk dialects that mark method literals as immutable so that
> you don't stumble into this confusion.
>
> Of course, the way to get what you want is to send #'copy' to the method
> literal before storing it in 'theString' method temporary. Then you get
> a new instance each time and can modify it without side-effects.
>
> James
>
> On Nov 2, 2010, at 8:42 PM, David Hari wrote:
>
>> Whilst writing a method recently, i noticed some strange behaviour
>> regarding the value of a String literal. My method created a blank
>> string of a fixed length, then each character was filled in based on a
>> bunch of boolean checks.
>>
>> Here is an example which demonstrates that behaviour:
>>
>> TestObject>>testStringLiteral: aBoolean
>>    | theString |
>>
>>    theString := '1234'.
>>    aBoolean ifTrue: [
>>        theString at: 1 put: $A.
>>        theString at: 2 put: $B.
>>        theString at: 3 put: $C.
>>        theString at: 4 put: $D.
>>    ].
>>    ^theString
>>
>> If the method is called with aBoolean as false, the string returned is
>> unchanged. But if aBoolean is true, the string is modified and any
>> further calls to #testStringLiteral: will return the modified string,
>> even if aBoolean is false.
>> In other words, the string is permanently set as 'ABCD' even though it
>> is always initialized to '1234' at the start of the method.
>>
>> So, why is this happening?
>>
>> Well, after doing some debugging and investigation, i found that it is
>> because a reference to the literal is stored in the CompiledMethod and
>> although the referent of a literal cannot change, this literal is a
>> String object, and therefore any of it's members *can* be changed (via
>> #at:put:).
>> Inspect the CompiledMethod returned from "TestObject methodDictionary
>> at: #testStringLiteral:" and you will see that the first literal is
> the
>> String object. Initially, this object is created as '1234'. But after
>> the code is ran (with aBoolean as true), each character of the object
> is
>> changed so the object is now 'ABCD'. The next time the code is ran,
> the
>> local variable is initialized to this object which has been modified.
>>
>> This behaviour is unexpected and different from other languages. Is
>> anyone else aware of this strange behaviour?
>>
>> ***           this signature added by listserv             ***
>> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
>> *** for archive browsing and VSWE-L membership management  ***
>>
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***
>



--
" To be is to do " ( Socrates )
" To be or not to be " ( Shakespeare )
" To do is to be " ( Sartre )
" Do be do be do " ( Sinatra )

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

Andreas Rosenberg
Hi everybody,

>This behaviour is unexpected and different from other languages. Is
>anyone else aware of this strange behaviour?

Not really strange. Consider this C example:

char *foo(int arg)
{
  static char string[20]="0123456789ABCDEFGHI\0";
  if (arg) {
    string[0] = 'a';
  }
  return string;
}

int main(int argc, char* argv[])
{
  printf("%s\r\n",foo(1));
  printf("%s\r\n",foo(0));
  return 0;
}

Exactly the same problem, as you described.

If a method/function is returning literal objects
this data must be stored somewhere.
If code modifies this data that is created at compile time,
this modification is permanent. This should be basic knowledge
for every programmer.

If you omit the "static" modifier in the C example, you will get random
stuff
or even crashes, because the memory where the string was allocated
when invoking foo is no longer valid. So just consider all literals
in Smalltalk being static, because they can be returned safely.

Creation of literals during compile time also has performance reasons.
Some programming languages also do "literal folding" (all code pieces
that use the same literal will point to a single instance - this saves
memory).

If code is modifing literals, this is a kind of "self modifying code",
which is usually regarded as one of the worst kinds of programming style.

With very few exceptions, I'm regrading it a bad idea to modify strings,
regardless if a literal string or not.
If a string is being used as a key in a dictionary and this string
instance is being modified, this can cause dictionary corruption.
When objects are added to dictionaries (generally hashed collections),
the object hash is used to compute internal slot indicies.
Modifying a string results in a different hash value. So the lookup
techniques
of hashed collections will be fooled.


Example:

| string d |
string := 'ABCDEF'.
d := Dictionary new.
d at: string put: 12345.
string atAllPut: $X.
(d at: string ifAbsent: [nil]) printString , ' ' , (d at: 'ABCDEF' ifAbsent:
[nil]) printString.

You can no longer access the dict value with your string reference,
nor with a string containing the same characters.

Regards
  Andreas


Andreas Rosenberg | eMail: [hidden email]
APIS GmbH         | Phone: +49 9482 9415-0
Im Haslet 42      | Fax: +49 9482 9415-55
93086 Wörth/D     | WWW: <http://www.apis.de/>
Germany           | <http://www.fmea.de/>



-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise
[mailto:[hidden email]]On Behalf Of Javier Burroni
Sent: Mittwoch, 3. November 2010 08:36
To: [hidden email]
Subject: Re: Modifying a string literal


Hi David,

On Wed, Nov 3, 2010 at 2:07 AM, David Hari <[hidden email]> wrote:
>> Some people actually take advantage of this behavior by creating and
> caching complex objects in a method literal.
> I'm not sure what you mean. Are talking about something like this:
>
> SomeObject>>getCachedObject
>    | temp |
>    temp := ComplexObject new; ...; yourself.
>    ^temp

I think James reefers to something like this:
memoizing
        | cache |
        cache := #(nil).
        cache first isNil ifTrue: [cache at: 1 put: 1000 factorial]
ifFalse:[self halt].
        ^cache first

I saw horrible people doing this in front of kids...

Actually, I like the idea of handling the compiledMethod as an object
from the compiledMethod itself (which is done implicitly in this
example). I would like to have some pseudovariable like thisMethod.
What do people think about that?

>
> Because this does as expected and returns a new object at each
> invocation. Because the object itself is not stored in the
> CompiledMethod, it only stores the byte codes required to create the
> object.
>
> Besides, I would prefer to cache objects by explicitly storing them in
> an instance variable or property. It just makes the code more readable.
>
>
>> I prefer Smalltalk dialects that mark method literals as immutable so
> that you don't stumble into this confusion.
> Yeah, that would be a better way of handling literals. I wonder why
> Visual Smalltalk does not do this?
>
>
>
> -----Original Message-----
> From: Using Visual Smalltalk for Windows/Enterprise
> [mailto:[hidden email]] On Behalf Of James Foster
> Sent: Wednesday, 3 November 2010 3:13 PM
> To: [hidden email]
> Subject: Re: Modifying a string literal
>
> Hi David,
>
> While somewhat unexpected, this behavior is a side-effect of reusing the
> literal (as you discovered). Some people actually take advantage of this
> behavior by creating and caching complex objects in a method literal. I
> prefer Smalltalk dialects that mark method literals as immutable so that
> you don't stumble into this confusion.
>
> Of course, the way to get what you want is to send #'copy' to the method
> literal before storing it in 'theString' method temporary. Then you get
> a new instance each time and can modify it without side-effects.
>
> James
>
> On Nov 2, 2010, at 8:42 PM, David Hari wrote:
>
>> Whilst writing a method recently, i noticed some strange behaviour
>> regarding the value of a String literal. My method created a blank
>> string of a fixed length, then each character was filled in based on a
>> bunch of boolean checks.
>>
>> Here is an example which demonstrates that behaviour:
>>
>> TestObject>>testStringLiteral: aBoolean
>>    | theString |
>>
>>    theString := '1234'.
>>    aBoolean ifTrue: [
>>        theString at: 1 put: $A.
>>        theString at: 2 put: $B.
>>        theString at: 3 put: $C.
>>        theString at: 4 put: $D.
>>    ].
>>    ^theString
>>
>> If the method is called with aBoolean as false, the string returned is
>> unchanged. But if aBoolean is true, the string is modified and any
>> further calls to #testStringLiteral: will return the modified string,
>> even if aBoolean is false.
>> In other words, the string is permanently set as 'ABCD' even though it
>> is always initialized to '1234' at the start of the method.
>>
>> So, why is this happening?
>>
>> Well, after doing some debugging and investigation, i found that it is
>> because a reference to the literal is stored in the CompiledMethod and
>> although the referent of a literal cannot change, this literal is a
>> String object, and therefore any of it's members *can* be changed (via
>> #at:put:).
>> Inspect the CompiledMethod returned from "TestObject methodDictionary
>> at: #testStringLiteral:" and you will see that the first literal is
> the
>> String object. Initially, this object is created as '1234'. But after
>> the code is ran (with aBoolean as true), each character of the object
> is
>> changed so the object is now 'ABCD'. The next time the code is ran,
> the
>> local variable is initialized to this object which has been modified.
>>
>> This behaviour is unexpected and different from other languages. Is
>> anyone else aware of this strange behaviour?
>>
>> ***           this signature added by listserv             ***
>> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
>> *** for archive browsing and VSWE-L membership management  ***
>>
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***
>



--
" To be is to do " ( Socrates )
" To be or not to be " ( Shakespeare )
" To do is to be " ( Sartre )
" Do be do be do " ( Sinatra )

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

Douglas Camp
In reply to this post by David Hari
I used to ask a Smalltalk interview question something like:  "You have an accessor which answers a literal -- but what the method answers isn't what you see in the source. What the heck is going on?".

Made for some creative/interesting answers. Most common response was probably just that the source was out of sync with the compiled method and people would advocate either recompiling -- or, sometimes, examining the decompiled source -- a not-bad answer, if not exactly the one I was seeking.

Perhaps not surprisingly, no one lets me do interviews anymore ;-)

Doug
--

On Nov 2, 2010, at 11:42 PM, David Hari wrote:

> Whilst writing a method recently, i noticed some strange behaviour
> regarding the value of a String literal. My method created a blank
> string of a fixed length, then each character was filled in based on a
> bunch of boolean checks.
>
> Here is an example which demonstrates that behaviour:
>
> TestObject>>testStringLiteral: aBoolean
>    | theString |
>
>    theString := '1234'.
>    aBoolean ifTrue: [
>        theString at: 1 put: $A.
>        theString at: 2 put: $B.
>        theString at: 3 put: $C.
>        theString at: 4 put: $D.
>    ].
>    ^theString
>
> If the method is called with aBoolean as false, the string returned is
> unchanged. But if aBoolean is true, the string is modified and any
> further calls to #testStringLiteral: will return the modified string,
> even if aBoolean is false.
> In other words, the string is permanently set as 'ABCD' even though it
> is always initialized to '1234' at the start of the method.
>
> So, why is this happening?
>
> Well, after doing some debugging and investigation, i found that it is
> because a reference to the literal is stored in the CompiledMethod and
> although the referent of a literal cannot change, this literal is a
> String object, and therefore any of it's members *can* be changed (via
> #at:put:).
> Inspect the CompiledMethod returned from "TestObject methodDictionary
> at: #testStringLiteral:" and you will see that the first literal is the
> String object. Initially, this object is created as '1234'. But after
> the code is ran (with aBoolean as true), each character of the object is
> changed so the object is now 'ABCD'. The next time the code is ran, the
> local variable is initialized to this object which has been modified.
>
> This behaviour is unexpected and different from other languages. Is
> anyone else aware of this strange behaviour?
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

rhawley
In reply to this post by Andreas Rosenberg
It would be best if the literal was automatically copied locally within the method and not be any different next time around. It is not in the spirit of Smalltalk to provide quirky behaviours like this - and it is not a necessary feature. Certainly I would consider it bad programming practice to make use of it.

Yours

Bob


On 3 Nov 2010, at 13:59, Andreas Rosenberg <[hidden email]> wrote:

> Hi everybody,
>
>> This behaviour is unexpected and different from other languages. Is
>> anyone else aware of this strange behaviour?
>
> Not really strange. Consider this C example:
>
> char *foo(int arg)
> {
>  static char string[20]="0123456789ABCDEFGHI\0";
>  if (arg) {
>    string[0] = 'a';
>  }
>  return string;
> }
>
> int main(int argc, char* argv[])
> {
>  printf("%s\r\n",foo(1));
>  printf("%s\r\n",foo(0));
>  return 0;
> }
>
> Exactly the same problem, as you described.
>
> If a method/function is returning literal objects
> this data must be stored somewhere.
> If code modifies this data that is created at compile time,
> this modification is permanent. This should be basic knowledge
> for every programmer.
>
> If you omit the "static" modifier in the C example, you will get random
> stuff
> or even crashes, because the memory where the string was allocated
> when invoking foo is no longer valid. So just consider all literals
> in Smalltalk being static, because they can be returned safely.
>
> Creation of literals during compile time also has performance reasons.
> Some programming languages also do "literal folding" (all code pieces
> that use the same literal will point to a single instance - this saves
> memory).
>
> If code is modifing literals, this is a kind of "self modifying code",
> which is usually regarded as one of the worst kinds of programming style.
>
> With very few exceptions, I'm regrading it a bad idea to modify strings,
> regardless if a literal string or not.
> If a string is being used as a key in a dictionary and this string
> instance is being modified, this can cause dictionary corruption.
> When objects are added to dictionaries (generally hashed collections),
> the object hash is used to compute internal slot indicies.
> Modifying a string results in a different hash value. So the lookup
> techniques
> of hashed collections will be fooled.
>
>
> Example:
>
> | string d |
> string := 'ABCDEF'.
> d := Dictionary new.
> d at: string put: 12345.
> string atAllPut: $X.
> (d at: string ifAbsent: [nil]) printString , ' ' , (d at: 'ABCDEF' ifAbsent:
> [nil]) printString.
>
> You can no longer access the dict value with your string reference,
> nor with a string containing the same characters.
>
> Regards
>  Andreas
>
>
> Andreas Rosenberg | eMail: [hidden email]
> APIS GmbH         | Phone: +49 9482 9415-0
> Im Haslet 42      | Fax: +49 9482 9415-55
> 93086 Wörth/D     | WWW: <http://www.apis.de/>
> Germany           | <http://www.fmea.de/>
>
>
>
> -----Original Message-----
> From: Using Visual Smalltalk for Windows/Enterprise
> [mailto:[hidden email]]On Behalf Of Javier Burroni
> Sent: Mittwoch, 3. November 2010 08:36
> To: [hidden email]
> Subject: Re: Modifying a string literal
>
>
> Hi David,
>
> On Wed, Nov 3, 2010 at 2:07 AM, David Hari <[hidden email]> wrote:
>>> Some people actually take advantage of this behavior by creating and
>> caching complex objects in a method literal.
>> I'm not sure what you mean. Are talking about something like this:
>>
>> SomeObject>>getCachedObject
>>    | temp |
>>    temp := ComplexObject new; ...; yourself.
>>    ^temp
>
> I think James reefers to something like this:
> memoizing
>    | cache |
>    cache := #(nil).
>    cache first isNil ifTrue: [cache at: 1 put: 1000 factorial]
> ifFalse:[self halt].
>    ^cache first
>
> I saw horrible people doing this in front of kids...
>
> Actually, I like the idea of handling the compiledMethod as an object
> from the compiledMethod itself (which is done implicitly in this
> example). I would like to have some pseudovariable like thisMethod.
> What do people think about that?
>
>>
>> Because this does as expected and returns a new object at each
>> invocation. Because the object itself is not stored in the
>> CompiledMethod, it only stores the byte codes required to create the
>> object.
>>
>> Besides, I would prefer to cache objects by explicitly storing them in
>> an instance variable or property. It just makes the code more readable.
>>
>>
>>> I prefer Smalltalk dialects that mark method literals as immutable so
>> that you don't stumble into this confusion.
>> Yeah, that would be a better way of handling literals. I wonder why
>> Visual Smalltalk does not do this?
>>
>>
>>
>> -----Original Message-----
>> From: Using Visual Smalltalk for Windows/Enterprise
>> [mailto:[hidden email]] On Behalf Of James Foster
>> Sent: Wednesday, 3 November 2010 3:13 PM
>> To: [hidden email]
>> Subject: Re: Modifying a string literal
>>
>> Hi David,
>>
>> While somewhat unexpected, this behavior is a side-effect of reusing the
>> literal (as you discovered). Some people actually take advantage of this
>> behavior by creating and caching complex objects in a method literal. I
>> prefer Smalltalk dialects that mark method literals as immutable so that
>> you don't stumble into this confusion.
>>
>> Of course, the way to get what you want is to send #'copy' to the method
>> literal before storing it in 'theString' method temporary. Then you get
>> a new instance each time and can modify it without side-effects.
>>
>> James
>>
>> On Nov 2, 2010, at 8:42 PM, David Hari wrote:
>>
>>> Whilst writing a method recently, i noticed some strange behaviour
>>> regarding the value of a String literal. My method created a blank
>>> string of a fixed length, then each character was filled in based on a
>>> bunch of boolean checks.
>>>
>>> Here is an example which demonstrates that behaviour:
>>>
>>> TestObject>>testStringLiteral: aBoolean
>>>    | theString |
>>>
>>>    theString := '1234'.
>>>    aBoolean ifTrue: [
>>>        theString at: 1 put: $A.
>>>        theString at: 2 put: $B.
>>>        theString at: 3 put: $C.
>>>        theString at: 4 put: $D.
>>>    ].
>>>    ^theString
>>>
>>> If the method is called with aBoolean as false, the string returned is
>>> unchanged. But if aBoolean is true, the string is modified and any
>>> further calls to #testStringLiteral: will return the modified string,
>>> even if aBoolean is false.
>>> In other words, the string is permanently set as 'ABCD' even though it
>>> is always initialized to '1234' at the start of the method.
>>>
>>> So, why is this happening?
>>>
>>> Well, after doing some debugging and investigation, i found that it is
>>> because a reference to the literal is stored in the CompiledMethod and
>>> although the referent of a literal cannot change, this literal is a
>>> String object, and therefore any of it's members *can* be changed (via
>>> #at:put:).
>>> Inspect the CompiledMethod returned from "TestObject methodDictionary
>>> at: #testStringLiteral:" and you will see that the first literal is
>> the
>>> String object. Initially, this object is created as '1234'. But after
>>> the code is ran (with aBoolean as true), each character of the object
>> is
>>> changed so the object is now 'ABCD'. The next time the code is ran,
>> the
>>> local variable is initialized to this object which has been modified.
>>>
>>> This behaviour is unexpected and different from other languages. Is
>>> anyone else aware of this strange behaviour?
>>>
>>> ***           this signature added by listserv             ***
>>> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
>>> *** for archive browsing and VSWE-L membership management  ***
>>>
>>
>> ***           this signature added by listserv             ***
>> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
>> *** for archive browsing and VSWE-L membership management  ***
>>
>> ***           this signature added by listserv             ***
>> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
>> *** for archive browsing and VSWE-L membership management  ***
>>
>
>
>
> --
> " To be is to do " ( Socrates )
> " To be or not to be " ( Shakespeare )
> " To do is to be " ( Sartre )
> " Do be do be do " ( Sinatra )
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

Andreas Rosenberg
While your suggestion may look reasonable it arises several problems:

What kind of literals should be copied?

Only strings?
  Would be sufficient for your case, but would be a bit inconsistent.

All possible literals?
  Some literals still may not be copied, because they are immutable.
  (symbols, nil, true, false, some integers, some characters)

What about performance issues? Always copying literals at method invocation
may created a lot of load for the gargabe collector although in many
cases the copies are not modified.

Another topic that must be looked upon: object identity.
If some literals would always be copied during method invocation, while
others objects are not copied this could create a big mess when working
with collections or code that relys on object identity.

An implementation working this way would require new VM opcodes, that create
specific copies when accessing literals. Also a new compiler that uses this
opcodes.

I would prefer a different solution:
Immutable subclasses for String, Array etc. The compiler would use
instances of these immutable classes for literals to prevent their
modification.

Would still require a modified compiler (but no new opcodes).
But without VM support, it is very hard to make things really immutable.

Both solutions would require a lot of work and a lot of things must be
changed, where no source is available.

To much effort and to litte gain.

Finally, Cincom would hardly do it.

So we need to learn to live with this.

Andreas


-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise
[mailto:[hidden email]]On Behalf Of Robert Hawley
Sent: Mittwoch, 3. November 2010 15:15
To: [hidden email]
Subject: Re: Modifying a string literal


It would be best if the literal was automatically copied locally within the
method and not be any different next time around. It is not in the spirit of
Smalltalk to provide quirky behaviours like this - and it is not a necessary
feature. Certainly I would consider it bad programming practice to make use
of it.

Yours

Bob



***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

Todor Todorov
Only mutable literals should be copied. But this creates another problem:
self literal == self literal.

Common sense would expect this to return true, but if copied, that will not be the case.

The correct solution will be to implement immutable (read-only) objects. Other dialects have that feature. An achievable solution, but not easy to implement, is to subclass String and re-implement all methods that may modify the object to signal an exception. Then hack the compiler to compile to those read-only string literals ... and go through all existing methods to replace the string literal with read-only strings.

Simplest solution is to not modify strings that are literal. Easy to implement, difficult to enforce :-/

Just my humble opinion.

-- Todor

-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Andreas Rosenberg
Sent: 3. november 2010 17:08
To: [hidden email]
Subject: Re: Modifying a string literal

While your suggestion may look reasonable it arises several problems:

What kind of literals should be copied?

Only strings?
  Would be sufficient for your case, but would be a bit inconsistent.

All possible literals?
  Some literals still may not be copied, because they are immutable.
  (symbols, nil, true, false, some integers, some characters)

What about performance issues? Always copying literals at method invocation may created a lot of load for the gargabe collector although in many cases the copies are not modified.

Another topic that must be looked upon: object identity.
If some literals would always be copied during method invocation, while others objects are not copied this could create a big mess when working with collections or code that relys on object identity.

An implementation working this way would require new VM opcodes, that create specific copies when accessing literals. Also a new compiler that uses this opcodes.

I would prefer a different solution:
Immutable subclasses for String, Array etc. The compiler would use instances of these immutable classes for literals to prevent their modification.

Would still require a modified compiler (but no new opcodes).
But without VM support, it is very hard to make things really immutable.

Both solutions would require a lot of work and a lot of things must be changed, where no source is available.

To much effort and to litte gain.

Finally, Cincom would hardly do it.

So we need to learn to live with this.

Andreas


-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]]On Behalf Of Robert Hawley
Sent: Mittwoch, 3. November 2010 15:15
To: [hidden email]
Subject: Re: Modifying a string literal


It would be best if the literal was automatically copied locally within the method and not be any different next time around. It is not in the spirit of Smalltalk to provide quirky behaviours like this - and it is not a necessary feature. Certainly I would consider it bad programming practice to make use of it.

Yours

Bob



***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

rhawley
In reply to this post by Andreas Rosenberg
… all fair points.

I don't think that it should be treated as a feature however - I reiterate that it is bad practice to use this deliberately.

Yours

Bob
________________________________________
From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Andreas Rosenberg [[hidden email]]
Sent: 03 November 2010 16:08
To: [hidden email]
Subject: Re: Modifying a string literal

While your suggestion may look reasonable it arises several problems:

What kind of literals should be copied?

Only strings?
  Would be sufficient for your case, but would be a bit inconsistent.

All possible literals?
  Some literals still may not be copied, because they are immutable.
  (symbols, nil, true, false, some integers, some characters)

What about performance issues? Always copying literals at method invocation
may created a lot of load for the gargabe collector although in many
cases the copies are not modified.

Another topic that must be looked upon: object identity.
If some literals would always be copied during method invocation, while
others objects are not copied this could create a big mess when working
with collections or code that relys on object identity.

An implementation working this way would require new VM opcodes, that create
specific copies when accessing literals. Also a new compiler that uses this
opcodes.

I would prefer a different solution:
Immutable subclasses for String, Array etc. The compiler would use
instances of these immutable classes for literals to prevent their
modification.

Would still require a modified compiler (but no new opcodes).
But without VM support, it is very hard to make things really immutable.

Both solutions would require a lot of work and a lot of things must be
changed, where no source is available.

To much effort and to litte gain.

Finally, Cincom would hardly do it.

So we need to learn to live with this.

Andreas


-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise
[mailto:[hidden email]]On Behalf Of Robert Hawley
Sent: Mittwoch, 3. November 2010 15:15
To: [hidden email]
Subject: Re: Modifying a string literal


It would be best if the literal was automatically copied locally within the
method and not be any different next time around. It is not in the spirit of
Smalltalk to provide quirky behaviours like this - and it is not a necessary
feature. Certainly I would consider it bad programming practice to make use
of it.

Yours

Bob



***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

David Hari
In reply to this post by Todor Todorov
"The correct solution will be to implement immutable (read-only) objects. Other dialects have that feature. An achievable solution, but not easy to implement, is to subclass String and re-implement all methods that may modify the object to signal an exception. Then hack the compiler to compile to those read-only string literals ... and go through all existing methods to replace the string literal with read-only strings."

Yes, i was thinking something like that too. As i noticed that Symbol overrides #at:put: to prevent modification, the same could be done for subclasses of String and Array (say, StringLiteral and ArrayLiteral). But like you said, the simplest solution is to not modify literals (which i don't normally do anyway).



-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Todor Todorov
Sent: Thursday, 4 November 2010 4:30 AM
To: [hidden email]
Subject: Re: Modifying a string literal

Only mutable literals should be copied. But this creates another problem:
self literal == self literal.

Common sense would expect this to return true, but if copied, that will not be the case.

The correct solution will be to implement immutable (read-only) objects. Other dialects have that feature. An achievable solution, but not easy to implement, is to subclass String and re-implement all methods that may modify the object to signal an exception. Then hack the compiler to compile to those read-only string literals ... and go through all existing methods to replace the string literal with read-only strings.

Simplest solution is to not modify strings that are literal. Easy to implement, difficult to enforce :-/

Just my humble opinion.

-- Todor

-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Andreas Rosenberg
Sent: 3. november 2010 17:08
To: [hidden email]
Subject: Re: Modifying a string literal

While your suggestion may look reasonable it arises several problems:

What kind of literals should be copied?

Only strings?
  Would be sufficient for your case, but would be a bit inconsistent.

All possible literals?
  Some literals still may not be copied, because they are immutable.
  (symbols, nil, true, false, some integers, some characters)

What about performance issues? Always copying literals at method invocation may created a lot of load for the gargabe collector although in many cases the copies are not modified.

Another topic that must be looked upon: object identity.
If some literals would always be copied during method invocation, while others objects are not copied this could create a big mess when working with collections or code that relys on object identity.

An implementation working this way would require new VM opcodes, that create specific copies when accessing literals. Also a new compiler that uses this opcodes.

I would prefer a different solution:
Immutable subclasses for String, Array etc. The compiler would use instances of these immutable classes for literals to prevent their modification.

Would still require a modified compiler (but no new opcodes).
But without VM support, it is very hard to make things really immutable.

Both solutions would require a lot of work and a lot of things must be changed, where no source is available.

To much effort and to litte gain.

Finally, Cincom would hardly do it.

So we need to learn to live with this.

Andreas


-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]]On Behalf Of Robert Hawley
Sent: Mittwoch, 3. November 2010 15:15
To: [hidden email]
Subject: Re: Modifying a string literal


It would be best if the literal was automatically copied locally within the method and not be any different next time around. It is not in the spirit of Smalltalk to provide quirky behaviours like this - and it is not a necessary feature. Certainly I would consider it bad programming practice to make use of it.

Yours

Bob



***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

David Hari
In reply to this post by Andreas Rosenberg
A few things to note about your C example:

1) You are declaring the string as static, which means you are explicitly saying that it will be assigned the first time the function is ran and the variable will still contain the same data after the function returns, sort of like a global (if my understanding of this "static" usage is correct).
Where as in Smalltalk, i did not expect a modified literal to stay that way after the function returned (i didn't think of it as "static").

2) Doesn't declaring it in that way mean that the literal ("0123456789ABCDEFGHI\0") is *copied* into the "string[20]" array? (This is just what i found by a quick google search).
Declaring a pointer to the literal would not work:

char *foo(int arg) {
  static char* string = "0123456789ABCDEFGHI\0";
  if (arg) {
    string[0] = 'a';
  }
  return string;
}

In other words, modifying string literals in C does not technically work, yet it is possible in Smalltalk.

Perhaps it is not strange and i should have known this, but at least i learned something today :)



-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Andreas Rosenberg
Sent: Thursday, 4 November 2010 12:19 AM
To: [hidden email]
Subject: Re: Modifying a string literal

Hi everybody,

>This behaviour is unexpected and different from other languages. Is
>anyone else aware of this strange behaviour?

Not really strange. Consider this C example:

char *foo(int arg)
{
  static char string[20]="0123456789ABCDEFGHI\0";
  if (arg) {
    string[0] = 'a';
  }
  return string;
}

int main(int argc, char* argv[])
{
  printf("%s\r\n",foo(1));
  printf("%s\r\n",foo(0));
  return 0;
}

Exactly the same problem, as you described.

If a method/function is returning literal objects
this data must be stored somewhere.
If code modifies this data that is created at compile time,
this modification is permanent. This should be basic knowledge
for every programmer.

If you omit the "static" modifier in the C example, you will get random
stuff
or even crashes, because the memory where the string was allocated
when invoking foo is no longer valid. So just consider all literals
in Smalltalk being static, because they can be returned safely.

Creation of literals during compile time also has performance reasons.
Some programming languages also do "literal folding" (all code pieces
that use the same literal will point to a single instance - this saves
memory).

If code is modifing literals, this is a kind of "self modifying code",
which is usually regarded as one of the worst kinds of programming style.

With very few exceptions, I'm regrading it a bad idea to modify strings,
regardless if a literal string or not.
If a string is being used as a key in a dictionary and this string
instance is being modified, this can cause dictionary corruption.
When objects are added to dictionaries (generally hashed collections),
the object hash is used to compute internal slot indicies.
Modifying a string results in a different hash value. So the lookup
techniques
of hashed collections will be fooled.


Example:

| string d |
string := 'ABCDEF'.
d := Dictionary new.
d at: string put: 12345.
string atAllPut: $X.
(d at: string ifAbsent: [nil]) printString , ' ' , (d at: 'ABCDEF' ifAbsent:
[nil]) printString.

You can no longer access the dict value with your string reference,
nor with a string containing the same characters.

Regards
  Andreas


Andreas Rosenberg | eMail: [hidden email]
APIS GmbH         | Phone: +49 9482 9415-0
Im Haslet 42      | Fax: +49 9482 9415-55
93086 Wörth/D     | WWW: <http://www.apis.de/>
Germany           | <http://www.fmea.de/>



-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise
[mailto:[hidden email]]On Behalf Of Javier Burroni
Sent: Mittwoch, 3. November 2010 08:36
To: [hidden email]
Subject: Re: Modifying a string literal


Hi David,

On Wed, Nov 3, 2010 at 2:07 AM, David Hari <[hidden email]> wrote:
>> Some people actually take advantage of this behavior by creating and
> caching complex objects in a method literal.
> I'm not sure what you mean. Are talking about something like this:
>
> SomeObject>>getCachedObject
>    | temp |
>    temp := ComplexObject new; ...; yourself.
>    ^temp

I think James reefers to something like this:
memoizing
        | cache |
        cache := #(nil).
        cache first isNil ifTrue: [cache at: 1 put: 1000 factorial]
ifFalse:[self halt].
        ^cache first

I saw horrible people doing this in front of kids...

Actually, I like the idea of handling the compiledMethod as an object
from the compiledMethod itself (which is done implicitly in this
example). I would like to have some pseudovariable like thisMethod.
What do people think about that?

>
> Because this does as expected and returns a new object at each
> invocation. Because the object itself is not stored in the
> CompiledMethod, it only stores the byte codes required to create the
> object.
>
> Besides, I would prefer to cache objects by explicitly storing them in
> an instance variable or property. It just makes the code more readable.
>
>
>> I prefer Smalltalk dialects that mark method literals as immutable so
> that you don't stumble into this confusion.
> Yeah, that would be a better way of handling literals. I wonder why
> Visual Smalltalk does not do this?
>
>
>
> -----Original Message-----
> From: Using Visual Smalltalk for Windows/Enterprise
> [mailto:[hidden email]] On Behalf Of James Foster
> Sent: Wednesday, 3 November 2010 3:13 PM
> To: [hidden email]
> Subject: Re: Modifying a string literal
>
> Hi David,
>
> While somewhat unexpected, this behavior is a side-effect of reusing the
> literal (as you discovered). Some people actually take advantage of this
> behavior by creating and caching complex objects in a method literal. I
> prefer Smalltalk dialects that mark method literals as immutable so that
> you don't stumble into this confusion.
>
> Of course, the way to get what you want is to send #'copy' to the method
> literal before storing it in 'theString' method temporary. Then you get
> a new instance each time and can modify it without side-effects.
>
> James
>
> On Nov 2, 2010, at 8:42 PM, David Hari wrote:
>
>> Whilst writing a method recently, i noticed some strange behaviour
>> regarding the value of a String literal. My method created a blank
>> string of a fixed length, then each character was filled in based on a
>> bunch of boolean checks.
>>
>> Here is an example which demonstrates that behaviour:
>>
>> TestObject>>testStringLiteral: aBoolean
>>    | theString |
>>
>>    theString := '1234'.
>>    aBoolean ifTrue: [
>>        theString at: 1 put: $A.
>>        theString at: 2 put: $B.
>>        theString at: 3 put: $C.
>>        theString at: 4 put: $D.
>>    ].
>>    ^theString
>>
>> If the method is called with aBoolean as false, the string returned is
>> unchanged. But if aBoolean is true, the string is modified and any
>> further calls to #testStringLiteral: will return the modified string,
>> even if aBoolean is false.
>> In other words, the string is permanently set as 'ABCD' even though it
>> is always initialized to '1234' at the start of the method.
>>
>> So, why is this happening?
>>
>> Well, after doing some debugging and investigation, i found that it is
>> because a reference to the literal is stored in the CompiledMethod and
>> although the referent of a literal cannot change, this literal is a
>> String object, and therefore any of it's members *can* be changed (via
>> #at:put:).
>> Inspect the CompiledMethod returned from "TestObject methodDictionary
>> at: #testStringLiteral:" and you will see that the first literal is
> the
>> String object. Initially, this object is created as '1234'. But after
>> the code is ran (with aBoolean as true), each character of the object
> is
>> changed so the object is now 'ABCD'. The next time the code is ran,
> the
>> local variable is initialized to this object which has been modified.
>>
>> This behaviour is unexpected and different from other languages. Is
>> anyone else aware of this strange behaviour?
>>
>> ***           this signature added by listserv             ***
>> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
>> *** for archive browsing and VSWE-L membership management  ***
>>
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***
>



--
" To be is to do " ( Socrates )
" To be or not to be " ( Shakespeare )
" To do is to be " ( Sartre )
" Do be do be do " ( Sinatra )

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

Andreas Rosenberg
>1) You are declaring the string as static, which means you are explicitly
saying that it will be assigned >the first time the function is ran and the
variable will still contain the same data after the function
>returns, sort of like a global (if my understanding of this "static" usage
is correct).
>Where as in Smalltalk, i did not expect a modified literal to stay that way
after the function returned (i >didn't think of it as "static").

Technically a binary executeable has 3 kind of segments:
 - code (the machine code )
 - data (all kinds of literals)
 - bss  (uninitialized variables)

So technically it makes no difference if a literal is used inside a
function or assigned to a global variable. It will always be located in the
data segment. Talking of "sort of global variable" is not correct in
my eyes, because it does not distinguish between the actual data and the
access scope. A global variable has a global access scope.
Static means, that each invocation of function foo will use the identical
instance
of the string object (using Smalltalk terms), which will be accessed by the
local
variable "string".
So in fact, it is the same like Smalltalk. Each invocation of your method
will
access the identical instance of the string located in the compiled method.

You are right, if you omit the static keyword the string will be copied.
The following asm code will be created:

        mov       esi,offset $string
        lea       edi,dword ptr [ebp-20] ; create local buffer on stack
        mov       ecx,5
        rep       movsd  ; string is being copied into local buffer
        cmp       dword ptr [ebp+8],0
        je        short @2
@2:
        lea       eax,dword ptr [ebp-20]  ; return address to buffer on
stack!!

The bytes representing the string are copied into a local buffer of the
function.
(the local variable string).
BUT if you return this variable, you are effectivly returning a pointer to
the
temporary buffer on the stack. If the function returns, this buffer will be
invalid
and will sooner or later contain "random" bytes, because this stack space
will
be overwritten during later function calls. This is a common error for
unexperienced
C programmers. If you modify the data you will effectivly modify the machine
stack,
which will cause crashes sooner or later.

So if you like to return a pointer that will remain valid after exiting the
function,
you MUST use static. Then you get this code:

        cmp       dword ptr [ebp+8],0
        je        short @2
        mov       byte ptr [$string],97
@2:
        mov       eax,offset $string

So to speak: Smalltalk prevents a common C mistake, by regarding all
literals as static ;-)

Regards
   Andreas


Andreas Rosenberg | eMail: [hidden email]
APIS GmbH         | Phone: +49 9482 9415-0
Im Haslet 42      | Fax: +49 9482 9415-55
93086 Wörth/D     | WWW: <http://www.apis.de/>
Germany           | <http://www.fmea.de/>




-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise
[mailto:[hidden email]]On Behalf Of David Hari
Sent: Donnerstag, 4. November 2010 00:56
To: [hidden email]
Subject: Re: Modifying a string literal


A few things to note about your C example:

1) You are declaring the string as static, which means you are explicitly
saying that it will be assigned the first time the function is ran and the
variable will still contain the same data after the function returns, sort
of like a global (if my understanding of this "static" usage is correct).
Where as in Smalltalk, i did not expect a modified literal to stay that way
after the function returned (i didn't think of it as "static").

2) Doesn't declaring it in that way mean that the literal
("0123456789ABCDEFGHI\0") is *copied* into the "string[20]" array? (This is
just what i found by a quick google search).
Declaring a pointer to the literal would not work:

char *foo(int arg) {
  static char* string = "0123456789ABCDEFGHI\0";
  if (arg) {
    string[0] = 'a';
  }
  return string;
}

In other words, modifying string literals in C does not technically work,
yet it is possible in Smalltalk.

Perhaps it is not strange and i should have known this, but at least i
learned something today :)



-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise
[mailto:[hidden email]] On Behalf Of Andreas Rosenberg
Sent: Thursday, 4 November 2010 12:19 AM
To: [hidden email]
Subject: Re: Modifying a string literal

Hi everybody,

>This behaviour is unexpected and different from other languages. Is
>anyone else aware of this strange behaviour?

Not really strange. Consider this C example:

char *foo(int arg)
{
  static char string[20]="0123456789ABCDEFGHI\0";
  if (arg) {
    string[0] = 'a';
  }
  return string;
}

int main(int argc, char* argv[])
{
  printf("%s\r\n",foo(1));
  printf("%s\r\n",foo(0));
  return 0;
}

Exactly the same problem, as you described.

If a method/function is returning literal objects
this data must be stored somewhere.
If code modifies this data that is created at compile time,
this modification is permanent. This should be basic knowledge
for every programmer.

If you omit the "static" modifier in the C example, you will get random
stuff
or even crashes, because the memory where the string was allocated
when invoking foo is no longer valid. So just consider all literals
in Smalltalk being static, because they can be returned safely.

Creation of literals during compile time also has performance reasons.
Some programming languages also do "literal folding" (all code pieces
that use the same literal will point to a single instance - this saves
memory).

If code is modifing literals, this is a kind of "self modifying code",
which is usually regarded as one of the worst kinds of programming style.

With very few exceptions, I'm regrading it a bad idea to modify strings,
regardless if a literal string or not.
If a string is being used as a key in a dictionary and this string
instance is being modified, this can cause dictionary corruption.
When objects are added to dictionaries (generally hashed collections),
the object hash is used to compute internal slot indicies.
Modifying a string results in a different hash value. So the lookup
techniques
of hashed collections will be fooled.


Example:

| string d |
string := 'ABCDEF'.
d := Dictionary new.
d at: string put: 12345.
string atAllPut: $X.
(d at: string ifAbsent: [nil]) printString , ' ' , (d at: 'ABCDEF' ifAbsent:
[nil]) printString.

You can no longer access the dict value with your string reference,
nor with a string containing the same characters.

Regards
  Andreas


Andreas Rosenberg | eMail: [hidden email]
APIS GmbH         | Phone: +49 9482 9415-0
Im Haslet 42      | Fax: +49 9482 9415-55
93086 Wörth/D     | WWW: <http://www.apis.de/>
Germany           | <http://www.fmea.de/>



-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise
[mailto:[hidden email]]On Behalf Of Javier Burroni
Sent: Mittwoch, 3. November 2010 08:36
To: [hidden email]
Subject: Re: Modifying a string literal


Hi David,

On Wed, Nov 3, 2010 at 2:07 AM, David Hari <[hidden email]> wrote:
>> Some people actually take advantage of this behavior by creating and
> caching complex objects in a method literal.
> I'm not sure what you mean. Are talking about something like this:
>
> SomeObject>>getCachedObject
>    | temp |
>    temp := ComplexObject new; ...; yourself.
>    ^temp

I think James reefers to something like this:
memoizing
        | cache |
        cache := #(nil).
        cache first isNil ifTrue: [cache at: 1 put: 1000 factorial]
ifFalse:[self halt].
        ^cache first

I saw horrible people doing this in front of kids...

Actually, I like the idea of handling the compiledMethod as an object
from the compiledMethod itself (which is done implicitly in this
example). I would like to have some pseudovariable like thisMethod.
What do people think about that?

>
> Because this does as expected and returns a new object at each
> invocation. Because the object itself is not stored in the
> CompiledMethod, it only stores the byte codes required to create the
> object.
>
> Besides, I would prefer to cache objects by explicitly storing them in
> an instance variable or property. It just makes the code more readable.
>
>
>> I prefer Smalltalk dialects that mark method literals as immutable so
> that you don't stumble into this confusion.
> Yeah, that would be a better way of handling literals. I wonder why
> Visual Smalltalk does not do this?
>
>
>
> -----Original Message-----
> From: Using Visual Smalltalk for Windows/Enterprise
> [mailto:[hidden email]] On Behalf Of James Foster
> Sent: Wednesday, 3 November 2010 3:13 PM
> To: [hidden email]
> Subject: Re: Modifying a string literal
>
> Hi David,
>
> While somewhat unexpected, this behavior is a side-effect of reusing the
> literal (as you discovered). Some people actually take advantage of this
> behavior by creating and caching complex objects in a method literal. I
> prefer Smalltalk dialects that mark method literals as immutable so that
> you don't stumble into this confusion.
>
> Of course, the way to get what you want is to send #'copy' to the method
> literal before storing it in 'theString' method temporary. Then you get
> a new instance each time and can modify it without side-effects.
>
> James
>
> On Nov 2, 2010, at 8:42 PM, David Hari wrote:
>
>> Whilst writing a method recently, i noticed some strange behaviour
>> regarding the value of a String literal. My method created a blank
>> string of a fixed length, then each character was filled in based on a
>> bunch of boolean checks.
>>
>> Here is an example which demonstrates that behaviour:
>>
>> TestObject>>testStringLiteral: aBoolean
>>    | theString |
>>
>>    theString := '1234'.
>>    aBoolean ifTrue: [
>>        theString at: 1 put: $A.
>>        theString at: 2 put: $B.
>>        theString at: 3 put: $C.
>>        theString at: 4 put: $D.
>>    ].
>>    ^theString
>>
>> If the method is called with aBoolean as false, the string returned is
>> unchanged. But if aBoolean is true, the string is modified and any
>> further calls to #testStringLiteral: will return the modified string,
>> even if aBoolean is false.
>> In other words, the string is permanently set as 'ABCD' even though it
>> is always initialized to '1234' at the start of the method.
>>
>> So, why is this happening?
>>
>> Well, after doing some debugging and investigation, i found that it is
>> because a reference to the literal is stored in the CompiledMethod and
>> although the referent of a literal cannot change, this literal is a
>> String object, and therefore any of it's members *can* be changed (via
>> #at:put:).
>> Inspect the CompiledMethod returned from "TestObject methodDictionary
>> at: #testStringLiteral:" and you will see that the first literal is
> the
>> String object. Initially, this object is created as '1234'. But after
>> the code is ran (with aBoolean as true), each character of the object
> is
>> changed so the object is now 'ABCD'. The next time the code is ran,
> the
>> local variable is initialized to this object which has been modified.
>>
>> This behaviour is unexpected and different from other languages. Is
>> anyone else aware of this strange behaviour?
>>
>> ***           this signature added by listserv             ***
>> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
>> *** for archive browsing and VSWE-L membership management  ***
>>
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***
>
> ***           this signature added by listserv             ***
> *** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
> *** for archive browsing and VSWE-L membership management  ***
>



--
" To be is to do " ( Socrates )
" To be or not to be " ( Shakespeare )
" To do is to be " ( Sartre )
" Do be do be do " ( Sinatra )

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

Claus Kick
In reply to this post by James Foster-5
James Foster schrieb:

*snip*

> Of course, the way to get what you want is to send #'copy' to the method literal before storing it in 'theString' method temporary.
 >Then you get a new instance each time and can modify it without
side-effects.

IIRC #copy in VS(E) only copies a reference, but does not create a new
object. You have to use deepCopy for that.

How did that example go:

|a b |

a := 'abcd'.

b := a.

a at: 1 put: #e.

^b

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

Todor Todorov
No, variable assignment copies the reference. Example:
a := Object new.
b := a. " variable <b> now has a copy of the object reference that is stored in <a>. "

The #copy method usually does a #shallowCopy, which usually copies ONE level of object references. Example:
o1:= Object new.
o2 := Object new.
a := Array with: o1 with: o2.
b := a copy. " Same as #shallowCopy. "
b := a shallowCopy. " Create a new array and copies the contents of the array, but not the contained objects themselves.
" ... equivalent to: "
b := Array with: o1 with: o2.


The #deepCopy is usually implemented to copy TWO levels down.
o1:= Object new.
o2 := Object new.
a := Array with: o1 with: o2.
b := a deepCopy. " Create a new array and copies the contents of the array, while copying the contained objects themselves.
" ... equivalent to: "
b := Array with: o1 shallowCopy with: o2 shallowCopy.


Some objects have re-implemented #copy if more knowledge exist about the copy process. Example is a Dictionary (HashedCollection), which technically contains several child objects but conceptually is a collection of associations. See the documentation of those objects for more info. When using the copy messages, always check if a local implementation exists and if it has any interesting comments on usage.

-- Todor

-----Original Message-----
From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Claus Kick
Sent: 4. november 2010 18:21
To: [hidden email]
Subject: Re: Modifying a string literal

James Foster schrieb:

*snip*

> Of course, the way to get what you want is to send #'copy' to the method literal before storing it in 'theString' method temporary.
 >Then you get a new instance each time and can modify it without side-effects.

IIRC #copy in VS(E) only copies a reference, but does not create a new object. You have to use deepCopy for that.

How did that example go:

|a b |

a := 'abcd'.

b := a.

a at: 1 put: #e.

^b

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***
Reply | Threaded
Open this post in threaded view
|

Re: Modifying a string literal

Claus Kick
Yes, you are right of course. It has been awhile since I last used any Smalltalk.

-- 
Claus Kick

"Wenn Sie mich suchen: Ich halte mich in der Nähe des Wahnsinns auf. 
Genauer gesagt auf der schmalen Linie zwischen Wahnsinn und Panik. 
Gleich um die Ecke von Todesangst, nicht weit weg von Irrwitz und Idiotie."

"If you are looking for me: I am somewhere near to lunacy. 
More clearly, on the narrow path between lunacy and panic. 
Right around the corner of  fear of death, 
not far away from idiocy and insanity."

-----Ursprüngliche Nachricht-----
Von: "Todor Todorov" <[hidden email]>
Gesendet: 04.11.2010 19:12:13
An: [hidden email]
Betreff: Re: [VSWE-L] Modifying a string literal

>No, variable assignment copies the reference. Example:
>a := Object new.
>b := a. " variable  now has a copy of the object reference that is stored in [. "
>
>The #copy method usually does a #shallowCopy, which usually copies ONE level of object references. Example:
>o1:= Object new.
>o2 := Object new.
>a := Array with: o1 with: o2.
>b := a copy. " Same as #shallowCopy. "
>b := a shallowCopy. " Create a new array and copies the contents of the array, but not the contained objects themselves.
>" ... equivalent to: "
>b := Array with: o1 with: o2.
>
>
>The #deepCopy is usually implemented to copy TWO levels down.
>o1:= Object new.
>o2 := Object new.
>a := Array with: o1 with: o2.
>b := a deepCopy. " Create a new array and copies the contents of the array, while copying the contained objects themselves.
>" ... equivalent to: "
>b := Array with: o1 shallowCopy with: o2 shallowCopy.
>
>
>Some objects have re-implemented #copy if more knowledge exist about the copy process. Example is a Dictionary (HashedCollection), which technically contains several child objects but conceptually is a collection of associations. See the documentation of those objects for more info. When using the copy messages, always check if a local implementation exists and if it has any interesting comments on usage.
>
>-- Todor
>
>-----Original Message-----
>From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Claus Kick
>Sent: 4. november 2010 18:21
>To: [hidden email]
>Subject: Re: Modifying a string literal
>
>James Foster schrieb:
>
>*snip*
>
>> Of course, the way to get what you want is to send #'copy' to the method literal before storing it in 'theString' method temporary.
> >Then you get a new instance each time and can modify it without side-effects.
>
>IIRC #copy in VS(E) only copies a reference, but does not create a new object. You have to use deepCopy for that.
>
>How did that example go:
>
>|a b |
>
>a := 'abcd'.
>
>b := a.
>
>a at: 1 put: #e.
>
>^b
>
>***           this signature added by listserv             ***
>*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
>*** for archive browsing and VSWE-L membership management  ***
>
>***           this signature added by listserv             ***
>*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
>*** for archive browsing and VSWE-L membership management  ***

***           this signature added by listserv             ***
*** Visit  http://www.listserv.dfn.de/archives/vswe-l.html ***
*** for archive browsing and VSWE-L membership management  ***