The Trunk: Collections-pre.762.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

The Trunk: Collections-pre.762.mcz

commits-2
Patrick Rein uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-pre.762.mcz

==================== Summary ====================

Name: Collections-pre.762
Author: pre
Time: 29 August 2017, 4:50:11.458834 pm
UUID: d7838b91-7ce4-c34c-ac5a-c46cee281140
Ancestors: Collections-bf.761

Changes the HTMLReadWriter to deal correctly with nested tags and their mapping to text attributes. Also adds a comment to the class.

=============== Diff against Collections-bf.761 ===============

Item was changed:
  TextReadWriter subclass: #HtmlReadWriter
  instanceVariableNames: 'count offset runStack runArray string breakLines'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Collections-Text'!
+
+ !HtmlReadWriter commentStamp: 'pre 8/29/2017 16:14' prior: 0!
+ A HtmlReadWriter is used to read a Text object from a string containing HTML or writing a Text object to a string with HTML tags representing the text attributes.
+
+ It does two things currently:
+ 1) Setting text attributes on the beginning of tags, e.g. setting a bold text attribute when seeing a <b> tag.
+ 2) Changing the resulting string, e.g. replacing a <br> with a Character cr.
+
+ The implementation works by pushing attributes on a stack on every opening tag. On the corresponding closing tag, the attribute is poped from the stack and stored in an array of attribute runs. From this array the final string is constructed.
+
+ ## Notes on the implementation
+ - The final run array is completely constructed while parsing so it has to be correct with regard to the length of the runs. There is no consolidation except for merging neighboring runs which include the same attributes.
+ - The *count* variable is the position in the source string, the *offset* is the number of skipped characters, for example ones that denote a tag.
+ - The stack contains elements which are of the form: {text attributes. current start index. original start}!

Item was added:
+ ----- Method: HtmlReadWriter>>addCharacter: (in category 'private') -----
+ addCharacter: aCharacter
+
+ string add: aCharacter.
+ count := count + 1.!

Item was added:
+ ----- Method: HtmlReadWriter>>addString: (in category 'private') -----
+ addString: aString
+
+ string addAll: aString.
+ count := count + aString size.!

Item was changed:
  ----- Method: HtmlReadWriter>>isTagIgnored: (in category 'testing') -----
  isTagIgnored: aTag
 
  | space t |
+ t := aTag copyWithoutAll: '</>'.
+ space := t indexOf: Character space.
- space := aTag indexOf: Character space.
  t := space > 0
+ ifTrue: [t copyFrom: 1 to: space - 1]
+ ifFalse: [t].
- ifTrue: [aTag copyFrom: 2 to: space - 1]
- ifFalse: [aTag copyFrom: 2 to: aTag size - 1].
  ^ self ignoredTags includes: t!

Item was changed:
  ----- Method: HtmlReadWriter>>mapCloseCodeTag (in category 'mapping') -----
  mapCloseCodeTag
 
  | theDoIt |
  theDoIt := runStack top first
  detect: [:attribute | attribute isKindOf: TextDoIt]
  ifNone: [^ self "nothing found, ignore"].
+ theDoIt evalString: (String withAll: (string copyFrom: runStack top third to: string size)).!
- theDoIt evalString: (String withAll: (string copyFrom: runStack top second to: string size)).!

Item was changed:
+ ----- Method: HtmlReadWriter>>nextPutText: (in category 'private') -----
- ----- Method: HtmlReadWriter>>nextPutText: (in category 'accessing') -----
  nextPutText: aText
 
  | previous |
  previous := #().
  self activateAttributesEnding: #() starting: previous. "for consistency"
  aText runs
  withStartStopAndValueDo: [:start :stop :attributes |
  self
  deactivateAttributesEnding: previous starting: attributes;
  activateAttributesEnding: previous starting: attributes;
  writeContent: (aText string copyFrom: start to: stop).
  previous := attributes].
  self deactivateAttributesEnding: previous starting: #().!

Item was changed:
+ ----- Method: HtmlReadWriter>>nextText (in category 'private') -----
- ----- Method: HtmlReadWriter>>nextText (in category 'accessing') -----
  nextText
 
  count := 0.
  offset := 0. "To ignore characters in the input string that are used by tags."
 
  runStack := Stack new.
 
  runArray := RunArray new.
  string := OrderedCollection new.
 
+ "{text attributes. current start index. original start}"
+ runStack push: {OrderedCollection new. 1. 1}.
- "{text attributes. start index. end index. number of open tags}"
- runStack push: {OrderedCollection new. 1. nil. 0}.
 
  [stream atEnd] whileFalse: [self processNextTag].
  self processRunStackTop. "Add last run."
 
  string := String withAll: string.
  runArray coalesce.
 
  ^ Text
  string: string
  runs: runArray!

Item was changed:
  ----- Method: HtmlReadWriter>>processEmptyTag: (in category 'reading') -----
  processEmptyTag: aTag
 
  (aTag beginsWith: '<br') ifTrue: [
+ self addCharacter: Character cr.
- string add: Character cr.
- count := count + 1.
  ^ self].
 
+ (self isTagIgnored: aTag)
- (self ignoredTags includes: (aTag copyFrom: 2 to: aTag size - 3))
  ifTrue: [^ self].
 
+ "TODO... what?"!
- "TODO..."!

Item was changed:
  ----- Method: HtmlReadWriter>>processEndTag: (in category 'reading') -----
  processEndTag: aTag
 
  | index tagName |
  index := count - offset.
  tagName := aTag copyFrom: 3 to: aTag size - 1.
 
+ (self isTagIgnored: tagName) ifTrue: [^ self].
+
- (self ignoredTags includes: tagName) ifTrue: [^ self].
  tagName = 'code' ifTrue: [self mapCloseCodeTag].
  tagName = 'pre' ifTrue: [self breakLines: true].
-
- "De-Accumulate adjacent tags."
- runStack top at: 4 put: runStack top fourth - 1.
- runStack top fourth > 0
- ifTrue: [^ self "not yet"].
 
  self processRunStackTop.
 
  runStack pop.
  runStack top at: 2 put: index + 1.!

Item was changed:
  ----- Method: HtmlReadWriter>>processHtmlEscape: (in category 'reading') -----
  processHtmlEscape: aString
  | escapeSequence |
  escapeSequence := aString copyFrom: 2 to: aString size - 1.
  escapeSequence first = $# ifTrue: [^ self processHtmlEscapeNumber: escapeSequence allButFirst].
  (String htmlEntities at: (aString copyFrom: 2 to: aString size - 1) ifAbsent: [])
  ifNotNil: [:char |
+ self addCharacter: char].!
- string add: char.
- count := count + 1].!

Item was changed:
+ ----- Method: HtmlReadWriter>>processHtmlEscapeNumber: (in category 'private') -----
- ----- Method: HtmlReadWriter>>processHtmlEscapeNumber: (in category 'reading') -----
  processHtmlEscapeNumber: aString
  | number |
  number := aString first = $x
  ifTrue: [ '16r', aString allButFirst ]
  ifFalse: [ aString ].
+ self addCharacter: number asNumber asCharacter.
+ !
- string add: number asNumber asCharacter!

Item was changed:
  ----- Method: HtmlReadWriter>>processNextTag (in category 'reading') -----
  processNextTag
 
  | tag htmlEscape lookForNewTag lookForHtmlEscape tagFound valid inComment inTagString |
  lookForNewTag := true.
  lookForHtmlEscape := false.
  tagFound := false.
  tag := OrderedCollection new.
  htmlEscape := OrderedCollection new.
  inComment := false.
  inTagString := false.
 
  [stream atEnd not and: [tagFound not]] whileTrue: [
  | character |
  character := stream next.
  valid := (#(10 13) includes: character asciiValue) not.
  count := count + 1.
 
  character = $< ifTrue: [lookForNewTag := false].
+ character = $& ifTrue: [inComment ifFalse: [lookForHtmlEscape := true]].
- character = $& ifTrue: [
- inComment ifFalse: [lookForHtmlEscape := true]].
 
  lookForNewTag
  ifTrue: [
  lookForHtmlEscape
  ifFalse: [
  (valid or: [self breakLines not])
  ifTrue: [string add: character]
  ifFalse: [offset := offset + 1]]
  ifTrue: [valid ifTrue: [htmlEscape add: character]. offset := offset + 1]]
  ifFalse: [valid ifTrue: [tag add: character]. offset := offset + 1].
 
  "Toggle within tag string/text."
  (character = $" and: [lookForNewTag not])
  ifTrue: [inTagString := inTagString not].
 
  inComment := ((lookForNewTag not and: [tag size >= 4])
  and: [tag beginsWith: '<!!--'])
  and: [(tag endsWith: '-->') not].
 
  (((character = $> and: [inComment not]) and: [lookForNewTag not]) and: [inTagString not]) ifTrue: [
  lookForNewTag := true.
  (tag beginsWith: '<!!--')
  ifTrue: [self processComment: (String withAll: tag)]
  ifFalse: [tag second ~= $/
  ifTrue: [
  (tag atLast: 2) == $/
  ifTrue: [self processEmptyTag: (String withAll: tag)]
  ifFalse: [self processStartTag: (String withAll: tag)]]
  ifFalse: [self processEndTag: (String withAll: tag)]].
  tagFound := true].
 
  (((character = $; and: [lookForNewTag])
  and: [htmlEscape notEmpty]) and: [htmlEscape first = $&]) ifTrue: [
  lookForHtmlEscape := false.
  self processHtmlEscape: (String withAll: htmlEscape).
  htmlEscape := OrderedCollection new]].
  !

Item was changed:
  ----- Method: HtmlReadWriter>>processRunStackTop (in category 'reading') -----
  processRunStackTop
  "Write accumulated attributes to run array."
 
+ | currentIndex start attrs |
+ currentIndex := count - offset.
- | index start end attrs |
- index := count - offset.
-
- "Set end index."
- runStack top at: 3 put: index.
- "Write to run array."
  start := runStack top second.
- end := runStack top third.
  attrs := runStack top first.
  runArray
  addLast: attrs asArray
+ times: currentIndex - start + 1.!
- times: end - start + 1.!

Item was changed:
  ----- Method: HtmlReadWriter>>processStartTag: (in category 'reading') -----
  processStartTag: aTag
 
  | index |
  (self isTagIgnored: aTag) ifTrue: [^ self].
 
  index := count - offset.
 
  aTag = '<br>' ifTrue: [
+ self addCharacter: Character cr.
- string add: Character cr.
- count := count + 1.
  ^ self].
  (aTag beginsWith: '<img') ifTrue: [
+ self addString: '[image]'.
- string addAll: '[image]'.
- count := count + 7.
  ^ self].
 
+ self processRunStackTop. "To add all attributes before the next tag adds some."
- "Accumulate adjacent tags."
- (runStack size > 1 and: [runStack top second = (index + 1) "= adjacent start tags"])
- ifTrue: [
- runStack top at: 1 put: (runStack top first copy addAll: (self mapTagToAttribute: aTag); yourself).
- runStack top at: 4 put: (runStack top fourth + 1). "increase number of open tags"
- ^self].
-
- self processRunStackTop.
 
- "Remove start/end info to reuse attributes later."
- runStack top at: 2 put: nil.
- runStack top at: 3 put: nil.
  "Copy attr list and add new attr."
+ runStack push: ({runStack top first copy addAll: (self mapTagToAttribute: aTag); yourself. index + 1 . index + 1}).
+ !
- runStack push: ({runStack top first copy addAll: (self mapTagToAttribute: aTag); yourself. index + 1. nil. 1}).!


Reply | Threaded
Open this post in threaded view
|

Re: The Trunk: Collections-pre.762.mcz

Hannes Hirzel
Thx for the comment. Very useful.

--Hannes

On Tue, 29 Aug 2017 14:50:19 0000, [hidden email]
<[hidden email]> wrote:

> Patrick Rein uploaded a new version of Collections to project The Trunk:
> http://source.squeak.org/trunk/Collections-pre.762.mcz
>
> ==================== Summary ====================
>
> Name: Collections-pre.762
> Author: pre
> Time: 29 August 2017, 4:50:11.458834 pm
> UUID: d7838b91-7ce4-c34c-ac5a-c46cee281140
> Ancestors: Collections-bf.761
>
> Changes the HTMLReadWriter to deal correctly with nested tags and their
> mapping to text attributes. Also adds a comment to the class.
>
> =============== Diff against Collections-bf.761 ===============
>
> Item was changed:
>   TextReadWriter subclass: #HtmlReadWriter
>   instanceVariableNames: 'count offset runStack runArray string breakLines'
>   classVariableNames: ''
>   poolDictionaries: ''
>   category: 'Collections-Text'!
> +
> + !HtmlReadWriter commentStamp: 'pre 8/29/2017 16:14' prior: 0!
> + A HtmlReadWriter is used to read a Text object from a string containing
> HTML or writing a Text object to a string with HTML tags representing the
> text attributes.
> +
> + It does two things currently:
> + 1) Setting text attributes on the beginning of tags, e.g. setting a bold
> text attribute when seeing a <b> tag.
> + 2) Changing the resulting string, e.g. replacing a <br> with a Character
> cr.
> +
> + The implementation works by pushing attributes on a stack on every opening
> tag. On the corresponding closing tag, the attribute is poped from the stack
> and stored in an array of attribute runs. From this array the final string
> is constructed.
> +
> + ## Notes on the implementation
> + - The final run array is completely constructed while parsing so it has to
> be correct with regard to the length of the runs. There is no consolidation
> except for merging neighboring runs which include the same attributes.
> + - The *count* variable is the position in the source string, the *offset*
> is the number of skipped characters, for example ones that denote a tag.
> + - The stack contains elements which are of the form: {text attributes.
> current start index. original start}!
>
> Item was added:
> + ----- Method: HtmlReadWriter>>addCharacter: (in category 'private') -----
> + addCharacter: aCharacter
> +
> + string add: aCharacter.
> + count := count + 1.!
>
> Item was added:
> + ----- Method: HtmlReadWriter>>addString: (in category 'private') -----
> + addString: aString
> +
> + string addAll: aString.
> + count := count + aString size.!
>
> Item was changed:
>   ----- Method: HtmlReadWriter>>isTagIgnored: (in category 'testing') -----
>   isTagIgnored: aTag
>
>   | space t |
> + t := aTag copyWithoutAll: '</>'.
> + space := t indexOf: Character space.
> - space := aTag indexOf: Character space.
>   t := space > 0
> + ifTrue: [t copyFrom: 1 to: space - 1]
> + ifFalse: [t].
> - ifTrue: [aTag copyFrom: 2 to: space - 1]
> - ifFalse: [aTag copyFrom: 2 to: aTag size - 1].
>   ^ self ignoredTags includes: t!
>
> Item was changed:
>   ----- Method: HtmlReadWriter>>mapCloseCodeTag (in category 'mapping')
> -----
>   mapCloseCodeTag
>
>   | theDoIt |
>   theDoIt := runStack top first
>   detect: [:attribute | attribute isKindOf: TextDoIt]
>   ifNone: [^ self "nothing found, ignore"].
> + theDoIt evalString: (String withAll: (string copyFrom: runStack top third
> to: string size)).!
> - theDoIt evalString: (String withAll: (string copyFrom: runStack top
> second to: string size)).!
>
> Item was changed:
> + ----- Method: HtmlReadWriter>>nextPutText: (in category 'private') -----
> - ----- Method: HtmlReadWriter>>nextPutText: (in category 'accessing') -----
>   nextPutText: aText
>
>   | previous |
>   previous := #().
>   self activateAttributesEnding: #() starting: previous. "for consistency"
>   aText runs
>   withStartStopAndValueDo: [:start :stop :attributes |
>   self
>   deactivateAttributesEnding: previous starting: attributes;
>   activateAttributesEnding: previous starting: attributes;
>   writeContent: (aText string copyFrom: start to: stop).
>   previous := attributes].
>   self deactivateAttributesEnding: previous starting: #().!
>
> Item was changed:
> + ----- Method: HtmlReadWriter>>nextText (in category 'private') -----
> - ----- Method: HtmlReadWriter>>nextText (in category 'accessing') -----
>   nextText
>
>   count := 0.
>   offset := 0. "To ignore characters in the input string that are used by
> tags."
>  
>   runStack := Stack new.
>  
>   runArray := RunArray new.
>   string := OrderedCollection new.
>  
> + "{text attributes. current start index. original start}"
> + runStack push: {OrderedCollection new. 1. 1}.
> - "{text attributes. start index. end index. number of open tags}"
> - runStack push: {OrderedCollection new. 1. nil. 0}.
>
>   [stream atEnd] whileFalse: [self processNextTag].
>   self processRunStackTop. "Add last run."
>
>   string := String withAll: string.
>   runArray coalesce.
>  
>   ^ Text
>   string: string
>   runs: runArray!
>
> Item was changed:
>   ----- Method: HtmlReadWriter>>processEmptyTag: (in category 'reading')
> -----
>   processEmptyTag: aTag
>
>   (aTag beginsWith: '<br') ifTrue: [
> + self addCharacter: Character cr.
> - string add: Character cr.
> - count := count + 1.
>   ^ self].
>  
> + (self isTagIgnored: aTag)
> - (self ignoredTags includes: (aTag copyFrom: 2 to: aTag size - 3))
>   ifTrue: [^ self].
>  
> + "TODO... what?"!
> - "TODO..."!
>
> Item was changed:
>   ----- Method: HtmlReadWriter>>processEndTag: (in category 'reading') -----
>   processEndTag: aTag
>
>   | index tagName |
>   index := count - offset.
>   tagName := aTag copyFrom: 3 to: aTag size - 1.
>
> + (self isTagIgnored: tagName) ifTrue: [^ self].
> +
> - (self ignoredTags includes: tagName) ifTrue: [^ self].
>   tagName = 'code' ifTrue: [self mapCloseCodeTag].
>   tagName = 'pre' ifTrue: [self breakLines: true].
> -
> - "De-Accumulate adjacent tags."
> - runStack top at: 4 put: runStack top fourth - 1.
> - runStack top fourth > 0
> - ifTrue: [^ self "not yet"].
>  
>   self processRunStackTop.
>
>   runStack pop.
>   runStack top at: 2 put: index + 1.!
>
> Item was changed:
>   ----- Method: HtmlReadWriter>>processHtmlEscape: (in category 'reading')
> -----
>   processHtmlEscape: aString
>   | escapeSequence |
>   escapeSequence := aString copyFrom: 2 to: aString size - 1.
>   escapeSequence first = $# ifTrue: [^ self processHtmlEscapeNumber:
> escapeSequence allButFirst].
>   (String htmlEntities at: (aString copyFrom: 2 to: aString size - 1)
> ifAbsent: [])
>   ifNotNil: [:char |
> + self addCharacter: char].!
> - string add: char.
> - count := count + 1].!
>
> Item was changed:
> + ----- Method: HtmlReadWriter>>processHtmlEscapeNumber: (in category
> 'private') -----
> - ----- Method: HtmlReadWriter>>processHtmlEscapeNumber: (in category
> 'reading') -----
>   processHtmlEscapeNumber: aString
>   | number |
>   number := aString first = $x
>   ifTrue: [ '16r', aString allButFirst ]
>   ifFalse: [ aString ].
> + self addCharacter: number asNumber asCharacter.
> + !
> - string add: number asNumber asCharacter!
>
> Item was changed:
>   ----- Method: HtmlReadWriter>>processNextTag (in category 'reading') -----
>   processNextTag
>
>   | tag htmlEscape lookForNewTag lookForHtmlEscape tagFound valid inComment
> inTagString |
>   lookForNewTag := true.
>   lookForHtmlEscape := false.
>   tagFound := false.
>   tag := OrderedCollection new.
>   htmlEscape := OrderedCollection new.
>   inComment := false.
>   inTagString := false.
>  
>   [stream atEnd not and: [tagFound not]] whileTrue: [
>   | character |
>   character := stream next.
>   valid := (#(10 13) includes: character asciiValue) not.
>   count := count + 1.
>  
>   character = $< ifTrue: [lookForNewTag := false].
> + character = $& ifTrue: [inComment ifFalse: [lookForHtmlEscape := true]].
> - character = $& ifTrue: [
> - inComment ifFalse: [lookForHtmlEscape := true]].
>  
>   lookForNewTag
>   ifTrue: [
>   lookForHtmlEscape
>   ifFalse: [
>   (valid or: [self breakLines not])
>   ifTrue: [string add: character]
>   ifFalse: [offset := offset + 1]]
>   ifTrue: [valid ifTrue: [htmlEscape add: character]. offset := offset
> + 1]]
>   ifFalse: [valid ifTrue: [tag add: character]. offset := offset + 1].
>
>   "Toggle within tag string/text."
>   (character = $" and: [lookForNewTag not])
>   ifTrue: [inTagString := inTagString not].
>  
>   inComment := ((lookForNewTag not and: [tag size >= 4])
>   and: [tag beginsWith: '<!!--'])
>   and: [(tag endsWith: '-->') not].
>
>   (((character = $> and: [inComment not]) and: [lookForNewTag not]) and:
> [inTagString not]) ifTrue: [
>   lookForNewTag := true.
>   (tag beginsWith: '<!!--')
>   ifTrue: [self processComment: (String withAll: tag)]
>   ifFalse: [tag second ~= $/
>   ifTrue: [
>   (tag atLast: 2) == $/
>   ifTrue: [self processEmptyTag: (String withAll: tag)]
>   ifFalse: [self processStartTag: (String withAll: tag)]]
>   ifFalse: [self processEndTag: (String withAll: tag)]].
>   tagFound := true].
>
>   (((character = $; and: [lookForNewTag])
>   and: [htmlEscape notEmpty]) and: [htmlEscape first = $&]) ifTrue: [
>   lookForHtmlEscape := false.
>   self processHtmlEscape: (String withAll: htmlEscape).
>   htmlEscape := OrderedCollection new]].
>   !
>
> Item was changed:
>   ----- Method: HtmlReadWriter>>processRunStackTop (in category 'reading')
> -----
>   processRunStackTop
>   "Write accumulated attributes to run array."
>  
> + | currentIndex start attrs |
> + currentIndex := count - offset.
> - | index start end attrs |
> - index := count - offset.
> -
> - "Set end index."
> - runStack top at: 3 put: index.
> - "Write to run array."
>   start := runStack top second.
> - end := runStack top third.
>   attrs := runStack top first.
>   runArray
>   addLast: attrs asArray
> + times: currentIndex - start + 1.!
> - times: end - start + 1.!
>
> Item was changed:
>   ----- Method: HtmlReadWriter>>processStartTag: (in category 'reading')
> -----
>   processStartTag: aTag
>
>   | index |
>   (self isTagIgnored: aTag) ifTrue: [^ self].
>
>   index := count - offset.
>
>   aTag = '<br>' ifTrue: [
> + self addCharacter: Character cr.
> - string add: Character cr.
> - count := count + 1.
>   ^ self].
>   (aTag beginsWith: '<img') ifTrue: [
> + self addString: '[image]'.
> - string addAll: '[image]'.
> - count := count + 7.
>   ^ self].
>  
> + self processRunStackTop. "To add all attributes before the next tag adds
> some."
> - "Accumulate adjacent tags."
> - (runStack size > 1 and: [runStack top second = (index + 1) "= adjacent
> start tags"])
> - ifTrue: [
> - runStack top at: 1 put: (runStack top first copy addAll: (self
> mapTagToAttribute: aTag); yourself).
> - runStack top at: 4 put: (runStack top fourth + 1). "increase number of
> open tags"
> - ^self].
> -
> - self processRunStackTop.
>
> - "Remove start/end info to reuse attributes later."
> - runStack top at: 2 put: nil.
> - runStack top at: 3 put: nil.
>   "Copy attr list and add new attr."
> + runStack push: ({runStack top first copy addAll: (self mapTagToAttribute:
> aTag); yourself. index + 1 . index + 1}).
> + !
> - runStack push: ({runStack top first copy addAll: (self mapTagToAttribute:
> aTag); yourself. index + 1. nil. 1}).!
>
>
>