Login  Register

Re: [ENH] isSeparator

Posted by Christoph Thiede on May 06, 2021; 8:32pm
URL: https://forum.world.st/ENH-isSeparator-tp5129517p5129519.html

Community support: Inlined changesets


--- isSeparator.1.cs ---

'From Squeak6.0alpha of 29 April 2021 [latest update: #20483] on 6 May 2021 at 10:21:24 pm'!

!Character methodsFor: 'testing' stamp: 'ct 5/6/2021 21:41'!
isSeparator
"Answer whether the receiver is a separator such as space, cr, tab, line feed, or form feed."

^ self encodedCharSet isSeparator: self! !


!EncodedCharSet class methodsFor: 'character classification' stamp: 'ct 5/6/2021 21:46'!
isSeparator: char
"Answer whether char has the code of a separator in this encoding."

^ self isSeparatorCode: char charCode! !

!EncodedCharSet class methodsFor: 'character classification' stamp: 'ct 5/6/2021 21:39'!
isSeparatorCode: anInteger
"Answer whether anInteger is the code of a separator."

^ Character separators includesCode: anInteger! !


!Unicode class methodsFor: 'character classification' stamp: 'ct 5/6/2021 21:51'!
isSeparatorCode: charCode

| cat |
cat := self generalCategoryOf: charCode.
^ cat = Cc or: [cat >= Zl and: [cat <= Zs]]! !
------

--- withAllBlanksTrimmed.1.cs ---
'From Squeak6.0alpha of 29 April 2021 [latest update: #20483] on 6 May 2021 at 10:24:39 pm'!

!String methodsFor: 'converting' stamp: 'ct 5/6/2021 21:56'!
withBlanksTrimmed
"Return a copy of the receiver from which leading and trailing blanks have been trimmed."

| first last |
first := (self findFirst: [:character | character isSeparator not]).
first = 0 ifTrue: [^ ''].
"no non-separator character"
last := self findLast: [:character | character isSeparator not].
last = 0 ifTrue: [last := self size].
(first = 1 and: [last = self size]) ifTrue: [^ self copy].
^ self copyFrom: first to: last! !


!StringTest methodsFor: 'tests - converting' stamp: 'ct 5/6/2021 22:00'!
testWithBlanksTrimmed

| s |
self assert: ' abc  d   ' withBlanksTrimmed = 'abc  d'.
self assert: 'abc  d   ' withBlanksTrimmed = 'abc  d'.
self assert: ' abc  d' withBlanksTrimmed = 'abc  d'.
self assert: (((0 to: 255) collect: [:each | each asCharacter] thenSelect: [:each | each isSeparator]) as: String) withBlanksTrimmed = ''.
self assert: ' nbsps around ' withBlanksTrimmed = 'nbsps around'.
s := 'abcd'.
self assert: s withBlanksTrimmed = s.
self assert: s withBlanksTrimmed ~~ s! !


!Text methodsFor: 'converting' stamp: 'ct 5/6/2021 21:57'!
withBlanksTrimmed
"Return a copy of the receiver from which leading and trailing blanks have been trimmed."

| first last |
first := string findFirst: [:character | character isSeparator not].
first = 0 ifTrue: [^ ''].
"no non-separator character"
last := string findLast: [:character | character isSeparator not].
last = 0 ifTrue: [last := self size].
(first = 1 and: [last = self size]) ifTrue: [^ self copy].
^ self copyFrom: first to: last! !
------


Von: Thiede, Christoph
Gesendet: Donnerstag, 6. Mai 2021 22:30:56
An: [hidden email]
Betreff: AW: [squeak-dev] [ENH] isSeparator
 

Hi all,


here is another tiny changeset, depending on isSeparator.cs: withAllBlanksTrimmed.cs uses the encoding-aware #isSeparator implementation to trim all kinds of spaces correctly from a string.


Best,
Christoph

Von: Squeak-dev <[hidden email]> im Auftrag von Thiede, Christoph
Gesendet: Donnerstag, 6. Mai 2021 22:27:57
An: [hidden email]
Betreff: [squeak-dev] [ENH] isSeparator
 
Hi all,

here is one tiny changeset for you: isSeparator.cs adds proper encoding-aware support for testing of separator chars. As opposed to the former implementation, non-ASCII characters such as the no-break space (U+00A0) will be identified correctly now, too.

Please review and merge! :-)

Best,
Christoph

["isSeparator.cs.gz"]


Carpe Squeak!