Chris Muller uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-cmm.836.mcz ==================== Summary ==================== Name: Collections-cmm.836 Author: cmm Time: 4 June 2019, 2:34:30.820747 pm UUID: 95d228d9-2fda-4e35-9665-d1daf90cbd99 Ancestors: Collections-cmm.835 - Move utility methods of Collection to 'utilities'. - #joinSeparatedBy: is useful even for non-SequenceableCollections. - Speed up String>>#subStrings:. =============== Diff against Collections-cmm.835 =============== Item was changed: + ----- Method: Collection>>asCommaString (in category 'utilities') ----- - ----- Method: Collection>>asCommaString (in category 'printing') ----- asCommaString "Return collection printed as 'a, b, c' " + ^ self joinSeparatedBy: ', '! - - ^String streamContents: [:s | self asStringOn: s delimiter: ', '] - ! Item was changed: + ----- Method: Collection>>asCommaStringAnd (in category 'utilities') ----- - ----- Method: Collection>>asCommaStringAnd (in category 'printing') ----- asCommaStringAnd "Return collection printed as 'a, b and c' " ^String streamContents: [:s | self asStringOn: s delimiter: ', ' last: ' and '] ! Item was changed: + ----- Method: Collection>>asStringOn:delimiter: (in category 'utilities') ----- - ----- Method: Collection>>asStringOn:delimiter: (in category 'printing') ----- asStringOn: aStream delimiter: delimString "Print elements on a stream separated with a delimiter String like: 'a, b, c' Uses #asString instead of #print:." self do: [:elem | aStream nextPutAll: elem asString] separatedBy: [aStream nextPutAll: delimString]! Item was changed: + ----- Method: Collection>>asStringOn:delimiter:last: (in category 'utilities') ----- - ----- Method: Collection>>asStringOn:delimiter:last: (in category 'printing') ----- asStringOn: aStream delimiter: delimString last: lastDelimString "Print elements on a stream separated with a delimiter between all the elements and with a special one before the last like: 'a, b and c'. Uses #asString instead of #print: Note: Feel free to improve the code to detect the last element." | n sz | n := 1. sz := self size. self do: [:elem | n := n + 1. aStream nextPutAll: elem asString] separatedBy: [ aStream nextPutAll: (n = sz ifTrue: [lastDelimString] ifFalse: [delimString])]! Item was changed: + ----- Method: Collection>>histogramOf: (in category 'utilities') ----- - ----- Method: Collection>>histogramOf: (in category 'converting') ----- histogramOf: aBlock ^ self collect: aBlock as: Bag! Item was added: + ----- Method: Collection>>join (in category 'utilities') ----- + join + "Example: #(H e l l o W o r l d) join = 'HelloWorld'." + ^ self joinSeparatedBy: String empty! Item was added: + ----- Method: Collection>>joinSeparatedBy: (in category 'utilities') ----- + joinSeparatedBy: aString + "Returns a string, which is a concatenation of each element's string representation separated by another string." + ^ String streamContents: + [ : stream | self asStringOn: stream delimiter: aString ]! Item was changed: + ----- Method: Collection>>topologicallySortedUsing: (in category 'utilities') ----- - ----- Method: Collection>>topologicallySortedUsing: (in category 'converting') ----- topologicallySortedUsing: aSortBlock "Answer a SortedCollection whose elements are the elements of the receiver, but topologically sorted. The topological order is defined by the argument, aSortBlock." | aSortedCollection | aSortedCollection := SortedCollection new: self size. aSortedCollection sortBlock: aSortBlock. self do: [:each | aSortedCollection addLast: each]. "avoids sorting" ^ aSortedCollection sortTopologically ! Item was removed: - ----- Method: SequenceableCollection>>join (in category 'converting') ----- - join - "Example: #(H e l l o W o r l d) join = 'HelloWorld'. " - - ^ self joinSeparatedBy: ''! Item was removed: - ----- Method: SequenceableCollection>>joinSeparatedBy: (in category 'converting') ----- - joinSeparatedBy: aSeparator - "Returns a string, which is a concatenation of each element's string representation separated by another string." - - ^ String streamContents: [:stream | - self - do: [:ea | stream nextPutAll: ea asString] - separatedBy: [stream nextPutAll: aSeparator asString]]! Item was changed: ----- Method: String>>subStrings: (in category 'converting') ----- subStrings: separators + "Answer an array containing the substrings in the receiver separated by the elements of separators." - "Answer an array containing the substrings in the receiver separated - by the elements of separators." | char result sourceStream subString | + (separators isString or: + [ separators allSatisfy: + [ : element | element isCharacter ] ]) ifFalse: [ ^ self error: 'separators must be Characters.' ]. - #Collectn. - "Changed 2000/04/08 For ANSI <readableString> protocol." - (separators isString or:[separators allSatisfy: [:element | element isCharacter]]) ifFalse: - [^ self error: 'separators must be Characters.']. sourceStream := ReadStream on: self. result := OrderedCollection new. + subString := WriteStream on: String empty. + [ sourceStream atEnd ] whileFalse: + [ char := sourceStream next. + (separators includes: char) + ifTrue: + [ subString isEmpty ifFalse: + [ result add: subString contents. + subString resetContents ] ] + ifFalse: [ subString nextPut: char ] ]. + subString isEmpty ifFalse: [ result add: subString contents ]. - subString := String new. - [sourceStream atEnd] - whileFalse: - [char := sourceStream next. - (separators includes: char) - ifTrue: [subString notEmpty - ifTrue: - [result add: subString copy. - subString := String new]] - ifFalse: [subString := subString , (String with: char)]]. - subString notEmpty ifTrue: [result add: subString copy]. ^ result asArray! |
I don't much like WriteStream on: String empty. If String empty is implemented as answering a literal, and WritStream implemented to become the grown contents as it originally did in st80 and could also do in Squeak now that we have a fast become, we take the risk to modify the literal... String new better express our intention, we want a new substring. Also, contents does not necessarily answer a copy, it could answer the verbatim contents if full. So i hope that resetContents would perform the copy, but that's far from certain without seing code. We thus risk to have substrings sharing identity, and overwrite previous substring... Note that we can also efficiently search index of any of a character set, starting at a given index in case of byte string... Le mar. 4 juin 2019 à 21:34, <[hidden email]> a écrit : Chris Muller uploaded a new version of Collections to project The Inbox: |
Hi Nicolas,
> I don't much like WriteStream on: String empty. > If String empty is implemented as answering a literal, and WritStream implemented to become the grown contents as it originally did in st80 and could also do in Squeak now that we have a fast become, we take the risk to modify the literal... We both know that's not going to happen. #empty has been a class-side constructor/accessor on several classes for a long time, with > 100 senders in trunk alone. It's an integrated part of the class-library's API which improves performance and readability. Any attempt to switch back to a become implementation for growing internal collections would spoil that API everywhere. > String new better express our intention, we want a new substring. But sometimes we trade intention-revealingness for performance. That's what this is. I would offer the compromise of WriteStream on: '', but with #empty being an integrated part of the API, it's the intent as everywhere else it's used. > Also, contents does not necessarily answer a copy, But it does. WriteStream>>#contents answers a #copyFrom:to: and even the comment says "Answer with a copy ..." - Chris |
Hi Chris, -1 for moving #join etc. up to Collection. You seem to favor some performance improvement over code readability. Making the order of the collection explicit before joining it feels more robust. #joinSeparatedBy: on non-sequencable collections has to much potential for bugs and unreadable code. See http://forum.world.st/The-Inbox-Collections-ct-827-mcz-tp5099876p5099964.html -- Having a non-seq. collection to join should not be a common case. -1 for mixing up separate concerns in a single commit. Please, try separating those things. Best, Marcel
|
Marcel Taeumel <[hidden email]> schrieb am Mi., 5. Juni 2019, 09:18:
Agreed. |
In reply to this post by Chris Muller-3
Le mer. 5 juin 2019 à 00:14, Chris Muller <[hidden email]> a écrit : Hi Nicolas, The number of senders does not matter. What is important is that it's programmer responsibility to make sure that this strings won't be mutated (become'd). Until we protect our literals... > String new better express our intention, we want a new substring.
I would be very surprised if you could measure any difference between String new and String empty in this specific case.
Performance-wise, we could also pre-allocate some capacity (String new: someHeuristiceSize). Knowing that some implementation did/might use a become, String empty is not future proof (if it has many senders, then it's even worse, once the literal mutated, many senders will miss-behave) > Also, contents does not necessarily answer a copy, It does not necessarily. For example in Xtreams, you'll see this method: XTSequenceableCollection>>close position < destination size ifTrue: [destination := destination copyFrom: 1 to: (position min: length)] The copy is omitted if unnecessary. I think that VW version additionally use destination become: (but I do not have a copy handy) We can consider that contents always copy as part of the contract. It's a matter of depending on very short contract, or long and detailed contract. Classical price of optimization... |
>> > I don't much like WriteStream on: String empty.
>> > If String empty is implemented as answering a literal, and WritStream implemented to become the grown contents as it originally did in st80 and could also do in Squeak now that we have a fast become, we take the risk to modify the literal... >> >> We both know that's not going to happen. #empty has been a class-side >> constructor/accessor on several classes for a long time, with > 100 >> senders in trunk alone. It's an integrated part of the >> class-library's API which improves performance and readability. Any >> attempt to switch back to a become implementation for growing internal >> collections would spoil that API everywhere. >> > The number of senders does not matter. > What is important is that it's programmer responsibility to make sure that this strings won't be mutated (become'd). > > Until we protect our literals... Fair enough. I respect your argument. >> > String new better express our intention, we want a new substring. >> >> But sometimes we trade intention-revealingness for performance. >> That's what this is. >> >> I would offer the compromise of WriteStream on: '', >> but with #empty >> being an integrated part of the API, it's the intent as everywhere >> else it's used. >> >> > I would be very surprised if you could measure any difference between String new and String empty in this specific case. Yes, especially since we're talking about making a collection of subStrings: anyway... :) I'm convinced. >> > Also, contents does not necessarily answer a copy, >> >> But it does. WriteStream>>#contents answers a #copyFrom:to: and even >> the comment says "Answer with a copy ..." >> >> - Chris >> > It does not necessarily. > For example in Xtreams, you'll see this method: > > XTSequenceableCollection>>close > position < destination size ifTrue: [destination := destination copyFrom: 1 to: (position min: length)] > > The copy is omitted if unnecessary. How does XTSequenceableCollection>>#close from an external package relate to use of a temporary WriteStream inside the method, String>>#subStrings:? > I think that VW version additionally use destination become: (but I do not have a copy handy) > We can consider that contents always copy as part of the contract. > It's a matter of depending on very short contract, or long and detailed contract. > Classical price of optimization... > > > |
In reply to this post by marcel.taeumel
Hi Marcel,
You're free to write your code the way you feel you need to, but I hope going forward you'll allow others to do the same. Even if your readability argument wasn't so flawed, trying to force others to write what you think is good code is utterly futile. I'll keep it in Collection in my Base package for now. Just so you know, it IS a common case to need it for non-seq collections -- it's how this whole discussion got started, remember? - Chris > -1 for moving #join etc. up to Collection. You seem to favor some performance improvement over code readability. Making the order of the collection explicit before joining it feels more robust. #joinSeparatedBy: on non-sequencable collections has to much potential for bugs and unreadable code. See http://forum.world.st/The-Inbox-Collections-ct-827-mcz-tp5099876p5099964.html -- Having a non-seq. collection to join should not be a common case. > > -1 for mixing up separate concerns in a single commit. Please, try separating those things. > > Best, > Marcel > > Am 05.06.2019 00:14:53 schrieb Chris Muller <[hidden email]>: > > Hi Nicolas, > > > I don't much like WriteStream on: String empty. > > If String empty is implemented as answering a literal, and WritStream implemented to become the grown contents as it originally did in st80 and could also do in Squeak now that we have a fast become, we take the risk to modify the literal... > > We both know that's not going to happen. #empty has been a class-side > constructor/accessor on several classes for a long time, with > 100 > senders in trunk alone. It's an integrated part of the > class-library's API which improves performance and readability. Any > attempt to switch back to a become implementation for growing internal > collections would spoil that API everywhere. > > > String new better express our intention, we want a new substring. > > But sometimes we trade intention-revealingness for performance. > That's what this is. > > I would offer the compromise of WriteStream on: '', but with #empty > being an integrated part of the API, it's the intent as everywhere > else it's used. > > > Also, contents does not necessarily answer a copy, > > But it does. WriteStream>>#contents answers a #copyFrom:to: and even > the comment says "Answer with a copy ..." > > - Chris > > |
In reply to this post by Chris Muller-3
Le mer. 5 juin 2019 à 22:50, Chris Muller <[hidden email]> a écrit : >> > I don't much like WriteStream on: String empty. If instead of explicitly allocating new substring target, we decide to reuse a recipient for optimization purpose, we then depend on implementation details, like assuming that contents does always copy. Xtreams illustrates that such details may vary from one stream implementation to another. As an alternate stream implementation based on composition rather than inheritance, Xtreams has been seriously considered as a potential Stream replacement, at least in Pharo, so it's kind of Stream refactoring that may happen. If we authorize ourself to omit a copy at one place (substrings) we might as well do it at a more general place (contents) for same reasons... We can also take such implementation details as granted and consider that they are part of the contract. The longer the contracts, the greater the chance to break the contract when we want to evolve/refactor the features. That's often the price we pay for optimizing... I'm ok with those design decisions if it really makes a measurable difference, but we might better add tests to make the contract explicit.
|
In reply to this post by Chris Muller-3
Hi Chris, a good exercise would be to read unfamiliar code on a regular basis. Unfortunately, there not many hard, objective metrics to asses readability. So, there are mostly opinions and customs. As a consequence, my argument for readability cannot be flawed in a general sense. ;-) Here is another summary of my thoughts: - #joinSeparatedBy: concatenates elements in collections of some stable order. - For other collections, a debugger would appear showing MNU. - The programmer can now notice and asses the unstable order of that collection. - The programmer can then choose to change the kind of collection (1) somwhere meaningful in the program or (2) directly before that join operation to make some order explicit. If it is just for debugging, an extra #asArray wouldn't hurt. If it is for production code, such inadvertent instability of collection order can make the live unnecessarily hard. Of course, Chris, you can read this as just another opinion of mine. You can also try to explain your two points somehow: (1) How big is that performance impact in your daily code? (2) Why don't you like the idea of making stable order in a collection more explicit somewhere in the code? Best, Marcel
|
Hi Marcel,
> a good exercise would be to read unfamiliar code on a regular basis. Unfortunately, there not many hard, objective metrics to asses readability. So, there are mostly opinions and customs. As a consequence, my argument for readability cannot be flawed in a general sense. ;-) Nor can it be valid, for the same reason. But that's not the flaw I was referring to. Here's why I will keep #joinSeparatedBy: on Collection: https://graphql.github.io/graphql-spec/June2018/#ImplementsInterfaces That spec says the list of interfaces of an ObjectTypeDefinition are joined and separated by '&'. But the order doesn't matter, especially to machines, which is the actor I'm printing for. So, for this general utility method to impose an order in sender contexts that absolutely /do not care/ becomes an unnecessary conversion that imposes an unnecessary cost, including even in readability. Being meaningless in that context, so it's easy to imagine it tripping up future readers forever, "why do we need this #asArray here?" What would the answer to that question be? What should it be? > - #joinSeparatedBy: concatenates elements in collections of some stable order. > - For other collections, a debugger would appear showing MNU. > - The programmer can now notice and asses the unstable order of that collection. A strange use of the word "unstable" here. They're simply unordered, Marcel. We both know UnorderedCollections are common in many designs. > - The programmer can then choose to change the kind of collection (1) somwhere meaningful in the program or (2) directly before that join operation to make some order explicit. You keep saying, "The programmer can...," but that's not the issue being argued, is it? The argument is about unnecessarily "forcing" the programmer into doing a conversion they don't want. > a good exercise would be to read unfamiliar code on a regular basis. Best, Chris > If it is just for debugging, an extra #asArray wouldn't hurt. If it is for production code, such inadvertent instability of collection order can make the live unnecessarily hard. > > Of course, Chris, you can read this as just another opinion of mine. You can also try to explain your two points somehow: (1) How big is that performance impact in your daily code? (2) Why don't you like the idea of making stable order in a collection more explicit somewhere in the code? > > Best, > Marcel > > Am 05.06.2019 23:40:22 schrieb Chris Muller <[hidden email]>: > > Hi Marcel, > > You're free to write your code the way you feel you need to, but I > hope going forward you'll allow others to do the same. Even if your > readability argument wasn't so flawed, trying to force others to write > what you think is good code is utterly futile. > > I'll keep it in Collection in my Base package for now. Just so you > know, it IS a common case to need it for non-seq collections -- it's > how this whole discussion got started, remember? > > - Chris > > > -1 for moving #join etc. up to Collection. You seem to favor some performance improvement over code readability. Making the order of the collection explicit before joining it feels more robust. #joinSeparatedBy: on non-sequencable collections has to much potential for bugs and unreadable code. See http://forum.world.st/The-Inbox-Collections-ct-827-mcz-tp5099876p5099964.html -- Having a non-seq. collection to join should not be a common case. > > > > -1 for mixing up separate concerns in a single commit. Please, try separating those things. > > > > Best, > > Marcel > > > > Am 05.06.2019 00:14:53 schrieb Chris Muller : > > > > Hi Nicolas, > > > > > I don't much like WriteStream on: String empty. > > > If String empty is implemented as answering a literal, and WritStream implemented to become the grown contents as it originally did in st80 and could also do in Squeak now that we have a fast become, we take the risk to modify the literal... > > > > We both know that's not going to happen. #empty has been a class-side > > constructor/accessor on several classes for a long time, with > 100 > > senders in trunk alone. It's an integrated part of the > > class-library's API which improves performance and readability. Any > > attempt to switch back to a become implementation for growing internal > > collections would spoil that API everywhere. > > > > > String new better express our intention, we want a new substring. > > > > But sometimes we trade intention-revealingness for performance. > > That's what this is. > > > > I would offer the compromise of WriteStream on: '', but with #empty > > being an integrated part of the API, it's the intent as everywhere > > else it's used. > > > > > Also, contents does not necessarily answer a copy, > > > > But it does. WriteStream>>#contents answers a #copyFrom:to: and even > > the comment says "Answer with a copy ..." > > > > - Chris > > > > > |
Free forum by Nabble | Edit this page |