Whilst writing a method recently, i noticed some strange behaviour
regarding the value of a String literal. My method created a blank string of a fixed length, then each character was filled in based on a bunch of boolean checks. Here is an example which demonstrates that behaviour: TestObject>>testStringLiteral: aBoolean | theString | theString := '1234'. aBoolean ifTrue: [ theString at: 1 put: $A. theString at: 2 put: $B. theString at: 3 put: $C. theString at: 4 put: $D. ]. ^theString If the method is called with aBoolean as false, the string returned is unchanged. But if aBoolean is true, the string is modified and any further calls to #testStringLiteral: will return the modified string, even if aBoolean is false. In other words, the string is permanently set as 'ABCD' even though it is always initialized to '1234' at the start of the method. So, why is this happening? Well, after doing some debugging and investigation, i found that it is because a reference to the literal is stored in the CompiledMethod and although the referent of a literal cannot change, this literal is a String object, and therefore any of it's members *can* be changed (via #at:put:). Inspect the CompiledMethod returned from "TestObject methodDictionary at: #testStringLiteral:" and you will see that the first literal is the String object. Initially, this object is created as '1234'. But after the code is ran (with aBoolean as true), each character of the object is changed so the object is now 'ABCD'. The next time the code is ran, the local variable is initialized to this object which has been modified. This behaviour is unexpected and different from other languages. Is anyone else aware of this strange behaviour? *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
Hi David,
While somewhat unexpected, this behavior is a side-effect of reusing the literal (as you discovered). Some people actually take advantage of this behavior by creating and caching complex objects in a method literal. I prefer Smalltalk dialects that mark method literals as immutable so that you don't stumble into this confusion. Of course, the way to get what you want is to send #'copy' to the method literal before storing it in 'theString' method temporary. Then you get a new instance each time and can modify it without side-effects. James On Nov 2, 2010, at 8:42 PM, David Hari wrote: > Whilst writing a method recently, i noticed some strange behaviour > regarding the value of a String literal. My method created a blank > string of a fixed length, then each character was filled in based on a > bunch of boolean checks. > > Here is an example which demonstrates that behaviour: > > TestObject>>testStringLiteral: aBoolean > | theString | > > theString := '1234'. > aBoolean ifTrue: [ > theString at: 1 put: $A. > theString at: 2 put: $B. > theString at: 3 put: $C. > theString at: 4 put: $D. > ]. > ^theString > > If the method is called with aBoolean as false, the string returned is > unchanged. But if aBoolean is true, the string is modified and any > further calls to #testStringLiteral: will return the modified string, > even if aBoolean is false. > In other words, the string is permanently set as 'ABCD' even though it > is always initialized to '1234' at the start of the method. > > So, why is this happening? > > Well, after doing some debugging and investigation, i found that it is > because a reference to the literal is stored in the CompiledMethod and > although the referent of a literal cannot change, this literal is a > String object, and therefore any of it's members *can* be changed (via > #at:put:). > Inspect the CompiledMethod returned from "TestObject methodDictionary > at: #testStringLiteral:" and you will see that the first literal is the > String object. Initially, this object is created as '1234'. But after > the code is ran (with aBoolean as true), each character of the object is > changed so the object is now 'ABCD'. The next time the code is ran, the > local variable is initialized to this object which has been modified. > > This behaviour is unexpected and different from other languages. Is > anyone else aware of this strange behaviour? > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** > *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
> Some people actually take advantage of this behavior by creating and
caching complex objects in a method literal. I'm not sure what you mean. Are talking about something like this: SomeObject>>getCachedObject | temp | temp := ComplexObject new; ...; yourself. ^temp Because this does as expected and returns a new object at each invocation. Because the object itself is not stored in the CompiledMethod, it only stores the byte codes required to create the object. Besides, I would prefer to cache objects by explicitly storing them in an instance variable or property. It just makes the code more readable. > I prefer Smalltalk dialects that mark method literals as immutable so that you don't stumble into this confusion. Yeah, that would be a better way of handling literals. I wonder why Visual Smalltalk does not do this? -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of James Foster Sent: Wednesday, 3 November 2010 3:13 PM To: [hidden email] Subject: Re: Modifying a string literal Hi David, While somewhat unexpected, this behavior is a side-effect of reusing the literal (as you discovered). Some people actually take advantage of this behavior by creating and caching complex objects in a method literal. I prefer Smalltalk dialects that mark method literals as immutable so that you don't stumble into this confusion. Of course, the way to get what you want is to send #'copy' to the method literal before storing it in 'theString' method temporary. Then you get a new instance each time and can modify it without side-effects. James On Nov 2, 2010, at 8:42 PM, David Hari wrote: > Whilst writing a method recently, i noticed some strange behaviour > regarding the value of a String literal. My method created a blank > string of a fixed length, then each character was filled in based on a > bunch of boolean checks. > > Here is an example which demonstrates that behaviour: > > TestObject>>testStringLiteral: aBoolean > | theString | > > theString := '1234'. > aBoolean ifTrue: [ > theString at: 1 put: $A. > theString at: 2 put: $B. > theString at: 3 put: $C. > theString at: 4 put: $D. > ]. > ^theString > > If the method is called with aBoolean as false, the string returned is > unchanged. But if aBoolean is true, the string is modified and any > further calls to #testStringLiteral: will return the modified string, > even if aBoolean is false. > In other words, the string is permanently set as 'ABCD' even though it > is always initialized to '1234' at the start of the method. > > So, why is this happening? > > Well, after doing some debugging and investigation, i found that it is > because a reference to the literal is stored in the CompiledMethod and > although the referent of a literal cannot change, this literal is a > String object, and therefore any of it's members *can* be changed (via > #at:put:). > Inspect the CompiledMethod returned from "TestObject methodDictionary > at: #testStringLiteral:" and you will see that the first literal is > String object. Initially, this object is created as '1234'. But after > the code is ran (with aBoolean as true), each character of the object is > changed so the object is now 'ABCD'. The next time the code is ran, the > local variable is initialized to this object which has been modified. > > This behaviour is unexpected and different from other languages. Is > anyone else aware of this strange behaviour? > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** > *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
Hi David,
On Wed, Nov 3, 2010 at 2:07 AM, David Hari <[hidden email]> wrote: >> Some people actually take advantage of this behavior by creating and > caching complex objects in a method literal. > I'm not sure what you mean. Are talking about something like this: > > SomeObject>>getCachedObject > | temp | > temp := ComplexObject new; ...; yourself. > ^temp I think James reefers to something like this: memoizing | cache | cache := #(nil). cache first isNil ifTrue: [cache at: 1 put: 1000 factorial] ifFalse:[self halt]. ^cache first I saw horrible people doing this in front of kids... Actually, I like the idea of handling the compiledMethod as an object from the compiledMethod itself (which is done implicitly in this example). I would like to have some pseudovariable like thisMethod. What do people think about that? > > Because this does as expected and returns a new object at each > invocation. Because the object itself is not stored in the > CompiledMethod, it only stores the byte codes required to create the > object. > > Besides, I would prefer to cache objects by explicitly storing them in > an instance variable or property. It just makes the code more readable. > > >> I prefer Smalltalk dialects that mark method literals as immutable so > that you don't stumble into this confusion. > Yeah, that would be a better way of handling literals. I wonder why > Visual Smalltalk does not do this? > > > > -----Original Message----- > From: Using Visual Smalltalk for Windows/Enterprise > [mailto:[hidden email]] On Behalf Of James Foster > Sent: Wednesday, 3 November 2010 3:13 PM > To: [hidden email] > Subject: Re: Modifying a string literal > > Hi David, > > While somewhat unexpected, this behavior is a side-effect of reusing the > literal (as you discovered). Some people actually take advantage of this > behavior by creating and caching complex objects in a method literal. I > prefer Smalltalk dialects that mark method literals as immutable so that > you don't stumble into this confusion. > > Of course, the way to get what you want is to send #'copy' to the method > literal before storing it in 'theString' method temporary. Then you get > a new instance each time and can modify it without side-effects. > > James > > On Nov 2, 2010, at 8:42 PM, David Hari wrote: > >> Whilst writing a method recently, i noticed some strange behaviour >> regarding the value of a String literal. My method created a blank >> string of a fixed length, then each character was filled in based on a >> bunch of boolean checks. >> >> Here is an example which demonstrates that behaviour: >> >> TestObject>>testStringLiteral: aBoolean >> | theString | >> >> theString := '1234'. >> aBoolean ifTrue: [ >> theString at: 1 put: $A. >> theString at: 2 put: $B. >> theString at: 3 put: $C. >> theString at: 4 put: $D. >> ]. >> ^theString >> >> If the method is called with aBoolean as false, the string returned is >> unchanged. But if aBoolean is true, the string is modified and any >> further calls to #testStringLiteral: will return the modified string, >> even if aBoolean is false. >> In other words, the string is permanently set as 'ABCD' even though it >> is always initialized to '1234' at the start of the method. >> >> So, why is this happening? >> >> Well, after doing some debugging and investigation, i found that it is >> because a reference to the literal is stored in the CompiledMethod and >> although the referent of a literal cannot change, this literal is a >> String object, and therefore any of it's members *can* be changed (via >> #at:put:). >> Inspect the CompiledMethod returned from "TestObject methodDictionary >> at: #testStringLiteral:" and you will see that the first literal is > the >> String object. Initially, this object is created as '1234'. But after >> the code is ran (with aBoolean as true), each character of the object > is >> changed so the object is now 'ABCD'. The next time the code is ran, > the >> local variable is initialized to this object which has been modified. >> >> This behaviour is unexpected and different from other languages. Is >> anyone else aware of this strange behaviour? >> >> *** this signature added by listserv *** >> *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** >> *** for archive browsing and VSWE-L membership management *** >> > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** > -- " To be is to do " ( Socrates ) " To be or not to be " ( Shakespeare ) " To do is to be " ( Sartre ) " Do be do be do " ( Sinatra ) *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
Hi everybody,
>This behaviour is unexpected and different from other languages. Is >anyone else aware of this strange behaviour? Not really strange. Consider this C example: char *foo(int arg) { static char string[20]="0123456789ABCDEFGHI\0"; if (arg) { string[0] = 'a'; } return string; } int main(int argc, char* argv[]) { printf("%s\r\n",foo(1)); printf("%s\r\n",foo(0)); return 0; } Exactly the same problem, as you described. If a method/function is returning literal objects this data must be stored somewhere. If code modifies this data that is created at compile time, this modification is permanent. This should be basic knowledge for every programmer. If you omit the "static" modifier in the C example, you will get random stuff or even crashes, because the memory where the string was allocated when invoking foo is no longer valid. So just consider all literals in Smalltalk being static, because they can be returned safely. Creation of literals during compile time also has performance reasons. Some programming languages also do "literal folding" (all code pieces that use the same literal will point to a single instance - this saves memory). If code is modifing literals, this is a kind of "self modifying code", which is usually regarded as one of the worst kinds of programming style. With very few exceptions, I'm regrading it a bad idea to modify strings, regardless if a literal string or not. If a string is being used as a key in a dictionary and this string instance is being modified, this can cause dictionary corruption. When objects are added to dictionaries (generally hashed collections), the object hash is used to compute internal slot indicies. Modifying a string results in a different hash value. So the lookup techniques of hashed collections will be fooled. Example: | string d | string := 'ABCDEF'. d := Dictionary new. d at: string put: 12345. string atAllPut: $X. (d at: string ifAbsent: [nil]) printString , ' ' , (d at: 'ABCDEF' ifAbsent: [nil]) printString. You can no longer access the dict value with your string reference, nor with a string containing the same characters. Regards Andreas Andreas Rosenberg | eMail: [hidden email] APIS GmbH | Phone: +49 9482 9415-0 Im Haslet 42 | Fax: +49 9482 9415-55 93086 Wörth/D | WWW: <http://www.apis.de/> Germany | <http://www.fmea.de/> -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]]On Behalf Of Javier Burroni Sent: Mittwoch, 3. November 2010 08:36 To: [hidden email] Subject: Re: Modifying a string literal Hi David, On Wed, Nov 3, 2010 at 2:07 AM, David Hari <[hidden email]> wrote: >> Some people actually take advantage of this behavior by creating and > caching complex objects in a method literal. > I'm not sure what you mean. Are talking about something like this: > > SomeObject>>getCachedObject > | temp | > temp := ComplexObject new; ...; yourself. > ^temp I think James reefers to something like this: memoizing | cache | cache := #(nil). cache first isNil ifTrue: [cache at: 1 put: 1000 factorial] ifFalse:[self halt]. ^cache first I saw horrible people doing this in front of kids... Actually, I like the idea of handling the compiledMethod as an object from the compiledMethod itself (which is done implicitly in this example). I would like to have some pseudovariable like thisMethod. What do people think about that? > > Because this does as expected and returns a new object at each > invocation. Because the object itself is not stored in the > CompiledMethod, it only stores the byte codes required to create the > object. > > Besides, I would prefer to cache objects by explicitly storing them in > an instance variable or property. It just makes the code more readable. > > >> I prefer Smalltalk dialects that mark method literals as immutable so > that you don't stumble into this confusion. > Yeah, that would be a better way of handling literals. I wonder why > Visual Smalltalk does not do this? > > > > -----Original Message----- > From: Using Visual Smalltalk for Windows/Enterprise > [mailto:[hidden email]] On Behalf Of James Foster > Sent: Wednesday, 3 November 2010 3:13 PM > To: [hidden email] > Subject: Re: Modifying a string literal > > Hi David, > > While somewhat unexpected, this behavior is a side-effect of reusing the > literal (as you discovered). Some people actually take advantage of this > behavior by creating and caching complex objects in a method literal. I > prefer Smalltalk dialects that mark method literals as immutable so that > you don't stumble into this confusion. > > Of course, the way to get what you want is to send #'copy' to the method > literal before storing it in 'theString' method temporary. Then you get > a new instance each time and can modify it without side-effects. > > James > > On Nov 2, 2010, at 8:42 PM, David Hari wrote: > >> Whilst writing a method recently, i noticed some strange behaviour >> regarding the value of a String literal. My method created a blank >> string of a fixed length, then each character was filled in based on a >> bunch of boolean checks. >> >> Here is an example which demonstrates that behaviour: >> >> TestObject>>testStringLiteral: aBoolean >> | theString | >> >> theString := '1234'. >> aBoolean ifTrue: [ >> theString at: 1 put: $A. >> theString at: 2 put: $B. >> theString at: 3 put: $C. >> theString at: 4 put: $D. >> ]. >> ^theString >> >> If the method is called with aBoolean as false, the string returned is >> unchanged. But if aBoolean is true, the string is modified and any >> further calls to #testStringLiteral: will return the modified string, >> even if aBoolean is false. >> In other words, the string is permanently set as 'ABCD' even though it >> is always initialized to '1234' at the start of the method. >> >> So, why is this happening? >> >> Well, after doing some debugging and investigation, i found that it is >> because a reference to the literal is stored in the CompiledMethod and >> although the referent of a literal cannot change, this literal is a >> String object, and therefore any of it's members *can* be changed (via >> #at:put:). >> Inspect the CompiledMethod returned from "TestObject methodDictionary >> at: #testStringLiteral:" and you will see that the first literal is > the >> String object. Initially, this object is created as '1234'. But after >> the code is ran (with aBoolean as true), each character of the object > is >> changed so the object is now 'ABCD'. The next time the code is ran, > the >> local variable is initialized to this object which has been modified. >> >> This behaviour is unexpected and different from other languages. Is >> anyone else aware of this strange behaviour? >> >> *** this signature added by listserv *** >> *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** >> *** for archive browsing and VSWE-L membership management *** >> > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** > -- " To be is to do " ( Socrates ) " To be or not to be " ( Shakespeare ) " To do is to be " ( Sartre ) " Do be do be do " ( Sinatra ) *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
In reply to this post by David Hari
I used to ask a Smalltalk interview question something like: "You have an accessor which answers a literal -- but what the method answers isn't what you see in the source. What the heck is going on?".
Made for some creative/interesting answers. Most common response was probably just that the source was out of sync with the compiled method and people would advocate either recompiling -- or, sometimes, examining the decompiled source -- a not-bad answer, if not exactly the one I was seeking. Perhaps not surprisingly, no one lets me do interviews anymore ;-) Doug -- On Nov 2, 2010, at 11:42 PM, David Hari wrote: > Whilst writing a method recently, i noticed some strange behaviour > regarding the value of a String literal. My method created a blank > string of a fixed length, then each character was filled in based on a > bunch of boolean checks. > > Here is an example which demonstrates that behaviour: > > TestObject>>testStringLiteral: aBoolean > | theString | > > theString := '1234'. > aBoolean ifTrue: [ > theString at: 1 put: $A. > theString at: 2 put: $B. > theString at: 3 put: $C. > theString at: 4 put: $D. > ]. > ^theString > > If the method is called with aBoolean as false, the string returned is > unchanged. But if aBoolean is true, the string is modified and any > further calls to #testStringLiteral: will return the modified string, > even if aBoolean is false. > In other words, the string is permanently set as 'ABCD' even though it > is always initialized to '1234' at the start of the method. > > So, why is this happening? > > Well, after doing some debugging and investigation, i found that it is > because a reference to the literal is stored in the CompiledMethod and > although the referent of a literal cannot change, this literal is a > String object, and therefore any of it's members *can* be changed (via > #at:put:). > Inspect the CompiledMethod returned from "TestObject methodDictionary > at: #testStringLiteral:" and you will see that the first literal is the > String object. Initially, this object is created as '1234'. But after > the code is ran (with aBoolean as true), each character of the object is > changed so the object is now 'ABCD'. The next time the code is ran, the > local variable is initialized to this object which has been modified. > > This behaviour is unexpected and different from other languages. Is > anyone else aware of this strange behaviour? > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
In reply to this post by Andreas Rosenberg
It would be best if the literal was automatically copied locally within the method and not be any different next time around. It is not in the spirit of Smalltalk to provide quirky behaviours like this - and it is not a necessary feature. Certainly I would consider it bad programming practice to make use of it.
Yours Bob On 3 Nov 2010, at 13:59, Andreas Rosenberg <[hidden email]> wrote: > Hi everybody, > >> This behaviour is unexpected and different from other languages. Is >> anyone else aware of this strange behaviour? > > Not really strange. Consider this C example: > > char *foo(int arg) > { > static char string[20]="0123456789ABCDEFGHI\0"; > if (arg) { > string[0] = 'a'; > } > return string; > } > > int main(int argc, char* argv[]) > { > printf("%s\r\n",foo(1)); > printf("%s\r\n",foo(0)); > return 0; > } > > Exactly the same problem, as you described. > > If a method/function is returning literal objects > this data must be stored somewhere. > If code modifies this data that is created at compile time, > this modification is permanent. This should be basic knowledge > for every programmer. > > If you omit the "static" modifier in the C example, you will get random > stuff > or even crashes, because the memory where the string was allocated > when invoking foo is no longer valid. So just consider all literals > in Smalltalk being static, because they can be returned safely. > > Creation of literals during compile time also has performance reasons. > Some programming languages also do "literal folding" (all code pieces > that use the same literal will point to a single instance - this saves > memory). > > If code is modifing literals, this is a kind of "self modifying code", > which is usually regarded as one of the worst kinds of programming style. > > With very few exceptions, I'm regrading it a bad idea to modify strings, > regardless if a literal string or not. > If a string is being used as a key in a dictionary and this string > instance is being modified, this can cause dictionary corruption. > When objects are added to dictionaries (generally hashed collections), > the object hash is used to compute internal slot indicies. > Modifying a string results in a different hash value. So the lookup > techniques > of hashed collections will be fooled. > > > Example: > > | string d | > string := 'ABCDEF'. > d := Dictionary new. > d at: string put: 12345. > string atAllPut: $X. > (d at: string ifAbsent: [nil]) printString , ' ' , (d at: 'ABCDEF' ifAbsent: > [nil]) printString. > > You can no longer access the dict value with your string reference, > nor with a string containing the same characters. > > Regards > Andreas > > > Andreas Rosenberg | eMail: [hidden email] > APIS GmbH | Phone: +49 9482 9415-0 > Im Haslet 42 | Fax: +49 9482 9415-55 > 93086 Wörth/D | WWW: <http://www.apis.de/> > Germany | <http://www.fmea.de/> > > > > -----Original Message----- > From: Using Visual Smalltalk for Windows/Enterprise > [mailto:[hidden email]]On Behalf Of Javier Burroni > Sent: Mittwoch, 3. November 2010 08:36 > To: [hidden email] > Subject: Re: Modifying a string literal > > > Hi David, > > On Wed, Nov 3, 2010 at 2:07 AM, David Hari <[hidden email]> wrote: >>> Some people actually take advantage of this behavior by creating and >> caching complex objects in a method literal. >> I'm not sure what you mean. Are talking about something like this: >> >> SomeObject>>getCachedObject >> | temp | >> temp := ComplexObject new; ...; yourself. >> ^temp > > I think James reefers to something like this: > memoizing > | cache | > cache := #(nil). > cache first isNil ifTrue: [cache at: 1 put: 1000 factorial] > ifFalse:[self halt]. > ^cache first > > I saw horrible people doing this in front of kids... > > Actually, I like the idea of handling the compiledMethod as an object > from the compiledMethod itself (which is done implicitly in this > example). I would like to have some pseudovariable like thisMethod. > What do people think about that? > >> >> Because this does as expected and returns a new object at each >> invocation. Because the object itself is not stored in the >> CompiledMethod, it only stores the byte codes required to create the >> object. >> >> Besides, I would prefer to cache objects by explicitly storing them in >> an instance variable or property. It just makes the code more readable. >> >> >>> I prefer Smalltalk dialects that mark method literals as immutable so >> that you don't stumble into this confusion. >> Yeah, that would be a better way of handling literals. I wonder why >> Visual Smalltalk does not do this? >> >> >> >> -----Original Message----- >> From: Using Visual Smalltalk for Windows/Enterprise >> [mailto:[hidden email]] On Behalf Of James Foster >> Sent: Wednesday, 3 November 2010 3:13 PM >> To: [hidden email] >> Subject: Re: Modifying a string literal >> >> Hi David, >> >> While somewhat unexpected, this behavior is a side-effect of reusing the >> literal (as you discovered). Some people actually take advantage of this >> behavior by creating and caching complex objects in a method literal. I >> prefer Smalltalk dialects that mark method literals as immutable so that >> you don't stumble into this confusion. >> >> Of course, the way to get what you want is to send #'copy' to the method >> literal before storing it in 'theString' method temporary. Then you get >> a new instance each time and can modify it without side-effects. >> >> James >> >> On Nov 2, 2010, at 8:42 PM, David Hari wrote: >> >>> Whilst writing a method recently, i noticed some strange behaviour >>> regarding the value of a String literal. My method created a blank >>> string of a fixed length, then each character was filled in based on a >>> bunch of boolean checks. >>> >>> Here is an example which demonstrates that behaviour: >>> >>> TestObject>>testStringLiteral: aBoolean >>> | theString | >>> >>> theString := '1234'. >>> aBoolean ifTrue: [ >>> theString at: 1 put: $A. >>> theString at: 2 put: $B. >>> theString at: 3 put: $C. >>> theString at: 4 put: $D. >>> ]. >>> ^theString >>> >>> If the method is called with aBoolean as false, the string returned is >>> unchanged. But if aBoolean is true, the string is modified and any >>> further calls to #testStringLiteral: will return the modified string, >>> even if aBoolean is false. >>> In other words, the string is permanently set as 'ABCD' even though it >>> is always initialized to '1234' at the start of the method. >>> >>> So, why is this happening? >>> >>> Well, after doing some debugging and investigation, i found that it is >>> because a reference to the literal is stored in the CompiledMethod and >>> although the referent of a literal cannot change, this literal is a >>> String object, and therefore any of it's members *can* be changed (via >>> #at:put:). >>> Inspect the CompiledMethod returned from "TestObject methodDictionary >>> at: #testStringLiteral:" and you will see that the first literal is >> the >>> String object. Initially, this object is created as '1234'. But after >>> the code is ran (with aBoolean as true), each character of the object >> is >>> changed so the object is now 'ABCD'. The next time the code is ran, >> the >>> local variable is initialized to this object which has been modified. >>> >>> This behaviour is unexpected and different from other languages. Is >>> anyone else aware of this strange behaviour? >>> >>> *** this signature added by listserv *** >>> *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** >>> *** for archive browsing and VSWE-L membership management *** >>> >> >> *** this signature added by listserv *** >> *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** >> *** for archive browsing and VSWE-L membership management *** >> >> *** this signature added by listserv *** >> *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** >> *** for archive browsing and VSWE-L membership management *** >> > > > > -- > " To be is to do " ( Socrates ) > " To be or not to be " ( Shakespeare ) > " To do is to be " ( Sartre ) > " Do be do be do " ( Sinatra ) > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
While your suggestion may look reasonable it arises several problems:
What kind of literals should be copied? Only strings? Would be sufficient for your case, but would be a bit inconsistent. All possible literals? Some literals still may not be copied, because they are immutable. (symbols, nil, true, false, some integers, some characters) What about performance issues? Always copying literals at method invocation may created a lot of load for the gargabe collector although in many cases the copies are not modified. Another topic that must be looked upon: object identity. If some literals would always be copied during method invocation, while others objects are not copied this could create a big mess when working with collections or code that relys on object identity. An implementation working this way would require new VM opcodes, that create specific copies when accessing literals. Also a new compiler that uses this opcodes. I would prefer a different solution: Immutable subclasses for String, Array etc. The compiler would use instances of these immutable classes for literals to prevent their modification. Would still require a modified compiler (but no new opcodes). But without VM support, it is very hard to make things really immutable. Both solutions would require a lot of work and a lot of things must be changed, where no source is available. To much effort and to litte gain. Finally, Cincom would hardly do it. So we need to learn to live with this. Andreas -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]]On Behalf Of Robert Hawley Sent: Mittwoch, 3. November 2010 15:15 To: [hidden email] Subject: Re: Modifying a string literal It would be best if the literal was automatically copied locally within the method and not be any different next time around. It is not in the spirit of Smalltalk to provide quirky behaviours like this - and it is not a necessary feature. Certainly I would consider it bad programming practice to make use of it. Yours Bob *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
Only mutable literals should be copied. But this creates another problem:
self literal == self literal. Common sense would expect this to return true, but if copied, that will not be the case. The correct solution will be to implement immutable (read-only) objects. Other dialects have that feature. An achievable solution, but not easy to implement, is to subclass String and re-implement all methods that may modify the object to signal an exception. Then hack the compiler to compile to those read-only string literals ... and go through all existing methods to replace the string literal with read-only strings. Simplest solution is to not modify strings that are literal. Easy to implement, difficult to enforce :-/ Just my humble opinion. -- Todor -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Andreas Rosenberg Sent: 3. november 2010 17:08 To: [hidden email] Subject: Re: Modifying a string literal While your suggestion may look reasonable it arises several problems: What kind of literals should be copied? Only strings? Would be sufficient for your case, but would be a bit inconsistent. All possible literals? Some literals still may not be copied, because they are immutable. (symbols, nil, true, false, some integers, some characters) What about performance issues? Always copying literals at method invocation may created a lot of load for the gargabe collector although in many cases the copies are not modified. Another topic that must be looked upon: object identity. If some literals would always be copied during method invocation, while others objects are not copied this could create a big mess when working with collections or code that relys on object identity. An implementation working this way would require new VM opcodes, that create specific copies when accessing literals. Also a new compiler that uses this opcodes. I would prefer a different solution: Immutable subclasses for String, Array etc. The compiler would use instances of these immutable classes for literals to prevent their modification. Would still require a modified compiler (but no new opcodes). But without VM support, it is very hard to make things really immutable. Both solutions would require a lot of work and a lot of things must be changed, where no source is available. To much effort and to litte gain. Finally, Cincom would hardly do it. So we need to learn to live with this. Andreas -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]]On Behalf Of Robert Hawley Sent: Mittwoch, 3. November 2010 15:15 To: [hidden email] Subject: Re: Modifying a string literal It would be best if the literal was automatically copied locally within the method and not be any different next time around. It is not in the spirit of Smalltalk to provide quirky behaviours like this - and it is not a necessary feature. Certainly I would consider it bad programming practice to make use of it. Yours Bob *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
In reply to this post by Andreas Rosenberg
… all fair points.
I don't think that it should be treated as a feature however - I reiterate that it is bad practice to use this deliberately. Yours Bob ________________________________________ From: Using Visual Smalltalk for Windows/Enterprise [[hidden email]] On Behalf Of Andreas Rosenberg [[hidden email]] Sent: 03 November 2010 16:08 To: [hidden email] Subject: Re: Modifying a string literal While your suggestion may look reasonable it arises several problems: What kind of literals should be copied? Only strings? Would be sufficient for your case, but would be a bit inconsistent. All possible literals? Some literals still may not be copied, because they are immutable. (symbols, nil, true, false, some integers, some characters) What about performance issues? Always copying literals at method invocation may created a lot of load for the gargabe collector although in many cases the copies are not modified. Another topic that must be looked upon: object identity. If some literals would always be copied during method invocation, while others objects are not copied this could create a big mess when working with collections or code that relys on object identity. An implementation working this way would require new VM opcodes, that create specific copies when accessing literals. Also a new compiler that uses this opcodes. I would prefer a different solution: Immutable subclasses for String, Array etc. The compiler would use instances of these immutable classes for literals to prevent their modification. Would still require a modified compiler (but no new opcodes). But without VM support, it is very hard to make things really immutable. Both solutions would require a lot of work and a lot of things must be changed, where no source is available. To much effort and to litte gain. Finally, Cincom would hardly do it. So we need to learn to live with this. Andreas -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]]On Behalf Of Robert Hawley Sent: Mittwoch, 3. November 2010 15:15 To: [hidden email] Subject: Re: Modifying a string literal It would be best if the literal was automatically copied locally within the method and not be any different next time around. It is not in the spirit of Smalltalk to provide quirky behaviours like this - and it is not a necessary feature. Certainly I would consider it bad programming practice to make use of it. Yours Bob *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
In reply to this post by Todor Todorov
"The correct solution will be to implement immutable (read-only) objects. Other dialects have that feature. An achievable solution, but not easy to implement, is to subclass String and re-implement all methods that may modify the object to signal an exception. Then hack the compiler to compile to those read-only string literals ... and go through all existing methods to replace the string literal with read-only strings."
Yes, i was thinking something like that too. As i noticed that Symbol overrides #at:put: to prevent modification, the same could be done for subclasses of String and Array (say, StringLiteral and ArrayLiteral). But like you said, the simplest solution is to not modify literals (which i don't normally do anyway). -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Todor Todorov Sent: Thursday, 4 November 2010 4:30 AM To: [hidden email] Subject: Re: Modifying a string literal Only mutable literals should be copied. But this creates another problem: self literal == self literal. Common sense would expect this to return true, but if copied, that will not be the case. The correct solution will be to implement immutable (read-only) objects. Other dialects have that feature. An achievable solution, but not easy to implement, is to subclass String and re-implement all methods that may modify the object to signal an exception. Then hack the compiler to compile to those read-only string literals ... and go through all existing methods to replace the string literal with read-only strings. Simplest solution is to not modify strings that are literal. Easy to implement, difficult to enforce :-/ Just my humble opinion. -- Todor -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Andreas Rosenberg Sent: 3. november 2010 17:08 To: [hidden email] Subject: Re: Modifying a string literal While your suggestion may look reasonable it arises several problems: What kind of literals should be copied? Only strings? Would be sufficient for your case, but would be a bit inconsistent. All possible literals? Some literals still may not be copied, because they are immutable. (symbols, nil, true, false, some integers, some characters) What about performance issues? Always copying literals at method invocation may created a lot of load for the gargabe collector although in many cases the copies are not modified. Another topic that must be looked upon: object identity. If some literals would always be copied during method invocation, while others objects are not copied this could create a big mess when working with collections or code that relys on object identity. An implementation working this way would require new VM opcodes, that create specific copies when accessing literals. Also a new compiler that uses this opcodes. I would prefer a different solution: Immutable subclasses for String, Array etc. The compiler would use instances of these immutable classes for literals to prevent their modification. Would still require a modified compiler (but no new opcodes). But without VM support, it is very hard to make things really immutable. Both solutions would require a lot of work and a lot of things must be changed, where no source is available. To much effort and to litte gain. Finally, Cincom would hardly do it. So we need to learn to live with this. Andreas -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]]On Behalf Of Robert Hawley Sent: Mittwoch, 3. November 2010 15:15 To: [hidden email] Subject: Re: Modifying a string literal It would be best if the literal was automatically copied locally within the method and not be any different next time around. It is not in the spirit of Smalltalk to provide quirky behaviours like this - and it is not a necessary feature. Certainly I would consider it bad programming practice to make use of it. Yours Bob *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
In reply to this post by Andreas Rosenberg
A few things to note about your C example:
1) You are declaring the string as static, which means you are explicitly saying that it will be assigned the first time the function is ran and the variable will still contain the same data after the function returns, sort of like a global (if my understanding of this "static" usage is correct). Where as in Smalltalk, i did not expect a modified literal to stay that way after the function returned (i didn't think of it as "static"). 2) Doesn't declaring it in that way mean that the literal ("0123456789ABCDEFGHI\0") is *copied* into the "string[20]" array? (This is just what i found by a quick google search). Declaring a pointer to the literal would not work: char *foo(int arg) { static char* string = "0123456789ABCDEFGHI\0"; if (arg) { string[0] = 'a'; } return string; } In other words, modifying string literals in C does not technically work, yet it is possible in Smalltalk. Perhaps it is not strange and i should have known this, but at least i learned something today :) -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Andreas Rosenberg Sent: Thursday, 4 November 2010 12:19 AM To: [hidden email] Subject: Re: Modifying a string literal Hi everybody, >This behaviour is unexpected and different from other languages. Is >anyone else aware of this strange behaviour? Not really strange. Consider this C example: char *foo(int arg) { static char string[20]="0123456789ABCDEFGHI\0"; if (arg) { string[0] = 'a'; } return string; } int main(int argc, char* argv[]) { printf("%s\r\n",foo(1)); printf("%s\r\n",foo(0)); return 0; } Exactly the same problem, as you described. If a method/function is returning literal objects this data must be stored somewhere. If code modifies this data that is created at compile time, this modification is permanent. This should be basic knowledge for every programmer. If you omit the "static" modifier in the C example, you will get random stuff or even crashes, because the memory where the string was allocated when invoking foo is no longer valid. So just consider all literals in Smalltalk being static, because they can be returned safely. Creation of literals during compile time also has performance reasons. Some programming languages also do "literal folding" (all code pieces that use the same literal will point to a single instance - this saves memory). If code is modifing literals, this is a kind of "self modifying code", which is usually regarded as one of the worst kinds of programming style. With very few exceptions, I'm regrading it a bad idea to modify strings, regardless if a literal string or not. If a string is being used as a key in a dictionary and this string instance is being modified, this can cause dictionary corruption. When objects are added to dictionaries (generally hashed collections), the object hash is used to compute internal slot indicies. Modifying a string results in a different hash value. So the lookup techniques of hashed collections will be fooled. Example: | string d | string := 'ABCDEF'. d := Dictionary new. d at: string put: 12345. string atAllPut: $X. (d at: string ifAbsent: [nil]) printString , ' ' , (d at: 'ABCDEF' ifAbsent: [nil]) printString. You can no longer access the dict value with your string reference, nor with a string containing the same characters. Regards Andreas Andreas Rosenberg | eMail: [hidden email] APIS GmbH | Phone: +49 9482 9415-0 Im Haslet 42 | Fax: +49 9482 9415-55 93086 Wörth/D | WWW: <http://www.apis.de/> Germany | <http://www.fmea.de/> -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]]On Behalf Of Javier Burroni Sent: Mittwoch, 3. November 2010 08:36 To: [hidden email] Subject: Re: Modifying a string literal Hi David, On Wed, Nov 3, 2010 at 2:07 AM, David Hari <[hidden email]> wrote: >> Some people actually take advantage of this behavior by creating and > caching complex objects in a method literal. > I'm not sure what you mean. Are talking about something like this: > > SomeObject>>getCachedObject > | temp | > temp := ComplexObject new; ...; yourself. > ^temp I think James reefers to something like this: memoizing | cache | cache := #(nil). cache first isNil ifTrue: [cache at: 1 put: 1000 factorial] ifFalse:[self halt]. ^cache first I saw horrible people doing this in front of kids... Actually, I like the idea of handling the compiledMethod as an object from the compiledMethod itself (which is done implicitly in this example). I would like to have some pseudovariable like thisMethod. What do people think about that? > > Because this does as expected and returns a new object at each > invocation. Because the object itself is not stored in the > CompiledMethod, it only stores the byte codes required to create the > object. > > Besides, I would prefer to cache objects by explicitly storing them in > an instance variable or property. It just makes the code more readable. > > >> I prefer Smalltalk dialects that mark method literals as immutable so > that you don't stumble into this confusion. > Yeah, that would be a better way of handling literals. I wonder why > Visual Smalltalk does not do this? > > > > -----Original Message----- > From: Using Visual Smalltalk for Windows/Enterprise > [mailto:[hidden email]] On Behalf Of James Foster > Sent: Wednesday, 3 November 2010 3:13 PM > To: [hidden email] > Subject: Re: Modifying a string literal > > Hi David, > > While somewhat unexpected, this behavior is a side-effect of reusing the > literal (as you discovered). Some people actually take advantage of this > behavior by creating and caching complex objects in a method literal. I > prefer Smalltalk dialects that mark method literals as immutable so that > you don't stumble into this confusion. > > Of course, the way to get what you want is to send #'copy' to the method > literal before storing it in 'theString' method temporary. Then you get > a new instance each time and can modify it without side-effects. > > James > > On Nov 2, 2010, at 8:42 PM, David Hari wrote: > >> Whilst writing a method recently, i noticed some strange behaviour >> regarding the value of a String literal. My method created a blank >> string of a fixed length, then each character was filled in based on a >> bunch of boolean checks. >> >> Here is an example which demonstrates that behaviour: >> >> TestObject>>testStringLiteral: aBoolean >> | theString | >> >> theString := '1234'. >> aBoolean ifTrue: [ >> theString at: 1 put: $A. >> theString at: 2 put: $B. >> theString at: 3 put: $C. >> theString at: 4 put: $D. >> ]. >> ^theString >> >> If the method is called with aBoolean as false, the string returned is >> unchanged. But if aBoolean is true, the string is modified and any >> further calls to #testStringLiteral: will return the modified string, >> even if aBoolean is false. >> In other words, the string is permanently set as 'ABCD' even though it >> is always initialized to '1234' at the start of the method. >> >> So, why is this happening? >> >> Well, after doing some debugging and investigation, i found that it is >> because a reference to the literal is stored in the CompiledMethod and >> although the referent of a literal cannot change, this literal is a >> String object, and therefore any of it's members *can* be changed (via >> #at:put:). >> Inspect the CompiledMethod returned from "TestObject methodDictionary >> at: #testStringLiteral:" and you will see that the first literal is > the >> String object. Initially, this object is created as '1234'. But after >> the code is ran (with aBoolean as true), each character of the object > is >> changed so the object is now 'ABCD'. The next time the code is ran, > the >> local variable is initialized to this object which has been modified. >> >> This behaviour is unexpected and different from other languages. Is >> anyone else aware of this strange behaviour? >> >> *** this signature added by listserv *** >> *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** >> *** for archive browsing and VSWE-L membership management *** >> > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** > -- " To be is to do " ( Socrates ) " To be or not to be " ( Shakespeare ) " To do is to be " ( Sartre ) " Do be do be do " ( Sinatra ) *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
>1) You are declaring the string as static, which means you are explicitly
saying that it will be assigned >the first time the function is ran and the variable will still contain the same data after the function >returns, sort of like a global (if my understanding of this "static" usage is correct). >Where as in Smalltalk, i did not expect a modified literal to stay that way after the function returned (i >didn't think of it as "static"). Technically a binary executeable has 3 kind of segments: - code (the machine code ) - data (all kinds of literals) - bss (uninitialized variables) So technically it makes no difference if a literal is used inside a function or assigned to a global variable. It will always be located in the data segment. Talking of "sort of global variable" is not correct in my eyes, because it does not distinguish between the actual data and the access scope. A global variable has a global access scope. Static means, that each invocation of function foo will use the identical instance of the string object (using Smalltalk terms), which will be accessed by the local variable "string". So in fact, it is the same like Smalltalk. Each invocation of your method will access the identical instance of the string located in the compiled method. You are right, if you omit the static keyword the string will be copied. The following asm code will be created: mov esi,offset $string lea edi,dword ptr [ebp-20] ; create local buffer on stack mov ecx,5 rep movsd ; string is being copied into local buffer cmp dword ptr [ebp+8],0 je short @2 @2: lea eax,dword ptr [ebp-20] ; return address to buffer on stack!! The bytes representing the string are copied into a local buffer of the function. (the local variable string). BUT if you return this variable, you are effectivly returning a pointer to the temporary buffer on the stack. If the function returns, this buffer will be invalid and will sooner or later contain "random" bytes, because this stack space will be overwritten during later function calls. This is a common error for unexperienced C programmers. If you modify the data you will effectivly modify the machine stack, which will cause crashes sooner or later. So if you like to return a pointer that will remain valid after exiting the function, you MUST use static. Then you get this code: cmp dword ptr [ebp+8],0 je short @2 mov byte ptr [$string],97 @2: mov eax,offset $string So to speak: Smalltalk prevents a common C mistake, by regarding all literals as static ;-) Regards Andreas Andreas Rosenberg | eMail: [hidden email] APIS GmbH | Phone: +49 9482 9415-0 Im Haslet 42 | Fax: +49 9482 9415-55 93086 Wörth/D | WWW: <http://www.apis.de/> Germany | <http://www.fmea.de/> -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]]On Behalf Of David Hari Sent: Donnerstag, 4. November 2010 00:56 To: [hidden email] Subject: Re: Modifying a string literal A few things to note about your C example: 1) You are declaring the string as static, which means you are explicitly saying that it will be assigned the first time the function is ran and the variable will still contain the same data after the function returns, sort of like a global (if my understanding of this "static" usage is correct). Where as in Smalltalk, i did not expect a modified literal to stay that way after the function returned (i didn't think of it as "static"). 2) Doesn't declaring it in that way mean that the literal ("0123456789ABCDEFGHI\0") is *copied* into the "string[20]" array? (This is just what i found by a quick google search). Declaring a pointer to the literal would not work: char *foo(int arg) { static char* string = "0123456789ABCDEFGHI\0"; if (arg) { string[0] = 'a'; } return string; } In other words, modifying string literals in C does not technically work, yet it is possible in Smalltalk. Perhaps it is not strange and i should have known this, but at least i learned something today :) -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Andreas Rosenberg Sent: Thursday, 4 November 2010 12:19 AM To: [hidden email] Subject: Re: Modifying a string literal Hi everybody, >This behaviour is unexpected and different from other languages. Is >anyone else aware of this strange behaviour? Not really strange. Consider this C example: char *foo(int arg) { static char string[20]="0123456789ABCDEFGHI\0"; if (arg) { string[0] = 'a'; } return string; } int main(int argc, char* argv[]) { printf("%s\r\n",foo(1)); printf("%s\r\n",foo(0)); return 0; } Exactly the same problem, as you described. If a method/function is returning literal objects this data must be stored somewhere. If code modifies this data that is created at compile time, this modification is permanent. This should be basic knowledge for every programmer. If you omit the "static" modifier in the C example, you will get random stuff or even crashes, because the memory where the string was allocated when invoking foo is no longer valid. So just consider all literals in Smalltalk being static, because they can be returned safely. Creation of literals during compile time also has performance reasons. Some programming languages also do "literal folding" (all code pieces that use the same literal will point to a single instance - this saves memory). If code is modifing literals, this is a kind of "self modifying code", which is usually regarded as one of the worst kinds of programming style. With very few exceptions, I'm regrading it a bad idea to modify strings, regardless if a literal string or not. If a string is being used as a key in a dictionary and this string instance is being modified, this can cause dictionary corruption. When objects are added to dictionaries (generally hashed collections), the object hash is used to compute internal slot indicies. Modifying a string results in a different hash value. So the lookup techniques of hashed collections will be fooled. Example: | string d | string := 'ABCDEF'. d := Dictionary new. d at: string put: 12345. string atAllPut: $X. (d at: string ifAbsent: [nil]) printString , ' ' , (d at: 'ABCDEF' ifAbsent: [nil]) printString. You can no longer access the dict value with your string reference, nor with a string containing the same characters. Regards Andreas Andreas Rosenberg | eMail: [hidden email] APIS GmbH | Phone: +49 9482 9415-0 Im Haslet 42 | Fax: +49 9482 9415-55 93086 Wörth/D | WWW: <http://www.apis.de/> Germany | <http://www.fmea.de/> -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]]On Behalf Of Javier Burroni Sent: Mittwoch, 3. November 2010 08:36 To: [hidden email] Subject: Re: Modifying a string literal Hi David, On Wed, Nov 3, 2010 at 2:07 AM, David Hari <[hidden email]> wrote: >> Some people actually take advantage of this behavior by creating and > caching complex objects in a method literal. > I'm not sure what you mean. Are talking about something like this: > > SomeObject>>getCachedObject > | temp | > temp := ComplexObject new; ...; yourself. > ^temp I think James reefers to something like this: memoizing | cache | cache := #(nil). cache first isNil ifTrue: [cache at: 1 put: 1000 factorial] ifFalse:[self halt]. ^cache first I saw horrible people doing this in front of kids... Actually, I like the idea of handling the compiledMethod as an object from the compiledMethod itself (which is done implicitly in this example). I would like to have some pseudovariable like thisMethod. What do people think about that? > > Because this does as expected and returns a new object at each > invocation. Because the object itself is not stored in the > CompiledMethod, it only stores the byte codes required to create the > object. > > Besides, I would prefer to cache objects by explicitly storing them in > an instance variable or property. It just makes the code more readable. > > >> I prefer Smalltalk dialects that mark method literals as immutable so > that you don't stumble into this confusion. > Yeah, that would be a better way of handling literals. I wonder why > Visual Smalltalk does not do this? > > > > -----Original Message----- > From: Using Visual Smalltalk for Windows/Enterprise > [mailto:[hidden email]] On Behalf Of James Foster > Sent: Wednesday, 3 November 2010 3:13 PM > To: [hidden email] > Subject: Re: Modifying a string literal > > Hi David, > > While somewhat unexpected, this behavior is a side-effect of reusing the > literal (as you discovered). Some people actually take advantage of this > behavior by creating and caching complex objects in a method literal. I > prefer Smalltalk dialects that mark method literals as immutable so that > you don't stumble into this confusion. > > Of course, the way to get what you want is to send #'copy' to the method > literal before storing it in 'theString' method temporary. Then you get > a new instance each time and can modify it without side-effects. > > James > > On Nov 2, 2010, at 8:42 PM, David Hari wrote: > >> Whilst writing a method recently, i noticed some strange behaviour >> regarding the value of a String literal. My method created a blank >> string of a fixed length, then each character was filled in based on a >> bunch of boolean checks. >> >> Here is an example which demonstrates that behaviour: >> >> TestObject>>testStringLiteral: aBoolean >> | theString | >> >> theString := '1234'. >> aBoolean ifTrue: [ >> theString at: 1 put: $A. >> theString at: 2 put: $B. >> theString at: 3 put: $C. >> theString at: 4 put: $D. >> ]. >> ^theString >> >> If the method is called with aBoolean as false, the string returned is >> unchanged. But if aBoolean is true, the string is modified and any >> further calls to #testStringLiteral: will return the modified string, >> even if aBoolean is false. >> In other words, the string is permanently set as 'ABCD' even though it >> is always initialized to '1234' at the start of the method. >> >> So, why is this happening? >> >> Well, after doing some debugging and investigation, i found that it is >> because a reference to the literal is stored in the CompiledMethod and >> although the referent of a literal cannot change, this literal is a >> String object, and therefore any of it's members *can* be changed (via >> #at:put:). >> Inspect the CompiledMethod returned from "TestObject methodDictionary >> at: #testStringLiteral:" and you will see that the first literal is > the >> String object. Initially, this object is created as '1234'. But after >> the code is ran (with aBoolean as true), each character of the object > is >> changed so the object is now 'ABCD'. The next time the code is ran, > the >> local variable is initialized to this object which has been modified. >> >> This behaviour is unexpected and different from other languages. Is >> anyone else aware of this strange behaviour? >> >> *** this signature added by listserv *** >> *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** >> *** for archive browsing and VSWE-L membership management *** >> > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** > > *** this signature added by listserv *** > *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** > *** for archive browsing and VSWE-L membership management *** > -- " To be is to do " ( Socrates ) " To be or not to be " ( Shakespeare ) " To do is to be " ( Sartre ) " Do be do be do " ( Sinatra ) *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
In reply to this post by jgfoster
James Foster schrieb:
*snip* > Of course, the way to get what you want is to send #'copy' to the method literal before storing it in 'theString' method temporary. >Then you get a new instance each time and can modify it without side-effects. IIRC #copy in VS(E) only copies a reference, but does not create a new object. You have to use deepCopy for that. How did that example go: |a b | a := 'abcd'. b := a. a at: 1 put: #e. ^b *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
No, variable assignment copies the reference. Example:
a := Object new. b := a. " variable <b> now has a copy of the object reference that is stored in <a>. " The #copy method usually does a #shallowCopy, which usually copies ONE level of object references. Example: o1:= Object new. o2 := Object new. a := Array with: o1 with: o2. b := a copy. " Same as #shallowCopy. " b := a shallowCopy. " Create a new array and copies the contents of the array, but not the contained objects themselves. " ... equivalent to: " b := Array with: o1 with: o2. The #deepCopy is usually implemented to copy TWO levels down. o1:= Object new. o2 := Object new. a := Array with: o1 with: o2. b := a deepCopy. " Create a new array and copies the contents of the array, while copying the contained objects themselves. " ... equivalent to: " b := Array with: o1 shallowCopy with: o2 shallowCopy. Some objects have re-implemented #copy if more knowledge exist about the copy process. Example is a Dictionary (HashedCollection), which technically contains several child objects but conceptually is a collection of associations. See the documentation of those objects for more info. When using the copy messages, always check if a local implementation exists and if it has any interesting comments on usage. -- Todor -----Original Message----- From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Claus Kick Sent: 4. november 2010 18:21 To: [hidden email] Subject: Re: Modifying a string literal James Foster schrieb: *snip* > Of course, the way to get what you want is to send #'copy' to the method literal before storing it in 'theString' method temporary. >Then you get a new instance each time and can modify it without side-effects. IIRC #copy in VS(E) only copies a reference, but does not create a new object. You have to use deepCopy for that. How did that example go: |a b | a := 'abcd'. b := a. a at: 1 put: #e. ^b *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
Yes, you are right of course. It has been awhile since I last used any Smalltalk.
-- Claus Kick "Wenn Sie mich suchen: Ich halte mich in der Nähe des Wahnsinns auf. Genauer gesagt auf der schmalen Linie zwischen Wahnsinn und Panik. Gleich um die Ecke von Todesangst, nicht weit weg von Irrwitz und Idiotie." "If you are looking for me: I am somewhere near to lunacy. More clearly, on the narrow path between lunacy and panic. Right around the corner of fear of death, not far away from idiocy and insanity." -----Ursprüngliche Nachricht----- Von: "Todor Todorov" <[hidden email]> Gesendet: 04.11.2010 19:12:13 An: [hidden email] Betreff: Re: [VSWE-L] Modifying a string literal >No, variable assignment copies the reference. Example: >a := Object new. >b := a. " variable now has a copy of the object reference that is stored in [. " > >The #copy method usually does a #shallowCopy, which usually copies ONE level of object references. Example: >o1:= Object new. >o2 := Object new. >a := Array with: o1 with: o2. >b := a copy. " Same as #shallowCopy. " >b := a shallowCopy. " Create a new array and copies the contents of the array, but not the contained objects themselves. >" ... equivalent to: " >b := Array with: o1 with: o2. > > >The #deepCopy is usually implemented to copy TWO levels down. >o1:= Object new. >o2 := Object new. >a := Array with: o1 with: o2. >b := a deepCopy. " Create a new array and copies the contents of the array, while copying the contained objects themselves. >" ... equivalent to: " >b := Array with: o1 shallowCopy with: o2 shallowCopy. > > >Some objects have re-implemented #copy if more knowledge exist about the copy process. Example is a Dictionary (HashedCollection), which technically contains several child objects but conceptually is a collection of associations. See the documentation of those objects for more info. When using the copy messages, always check if a local implementation exists and if it has any interesting comments on usage. > >-- Todor > >-----Original Message----- >From: Using Visual Smalltalk for Windows/Enterprise [mailto:[hidden email]] On Behalf Of Claus Kick >Sent: 4. november 2010 18:21 >To: [hidden email] >Subject: Re: Modifying a string literal > >James Foster schrieb: > >*snip* > >> Of course, the way to get what you want is to send #'copy' to the method literal before storing it in 'theString' method temporary. > >Then you get a new instance each time and can modify it without side-effects. > >IIRC #copy in VS(E) only copies a reference, but does not create a new object. You have to use deepCopy for that. > >How did that example go: > >|a b | > >a := 'abcd'. > >b := a. > >a at: 1 put: #e. > >^b > >*** this signature added by listserv *** >*** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** >*** for archive browsing and VSWE-L membership management *** > >*** this signature added by listserv *** >*** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** >*** for archive browsing and VSWE-L membership management *** *** this signature added by listserv *** *** Visit http://www.listserv.dfn.de/archives/vswe-l.html *** *** for archive browsing and VSWE-L membership management *** |
Free forum by Nabble | Edit this page |