HtmlParser MNU: ByteString>>replaceHtmlCharRefs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
bpi
Reply | Threaded
Open this post in threaded view
|

HtmlParser MNU: ByteString>>replaceHtmlCharRefs

bpi
Dear Squeakers,

I tried to parse an HTML file like this in a trunk image and ran into a MNU:
FileStream fileNamed: ’some.html’ do: [:stream | HtmlParser parse: stream]

In HtmlText>>#initialize the message #replaceHtmlCharRefs is sent. I suppose this method was once the image. Otherwise HtmlParser would never have worked. How can I find out, when it got lost? How would you do it?

Cheers,
Bernhard

Reply | Threaded
Open this post in threaded view
|

Re: HtmlParser MNU: ByteString>>replaceHtmlCharRefs

Bob Arning-2
'From Squeak3.4 of 1 March 2003 [latest update: #5170] on 29 March 2003 at 6:47:54 pm'
"Change Set:        RemoveScamper
Date:            29 March 2003
Author:            Adam Spitz

Removes Scamper from the image (assuming all references to it have already been removed)."


String removeSelector: #replaceHtmlCharRefs.
Smalltalk organization removeCategoriesMatching: 'Network-HTML*'.
Smalltalk organization removeCategoriesMatching: 'Network-Web Browser'.

Utilities informUser: 'Removing Scamper thumbnails from Tools flap and PartsBin. Please wait...' during: [
    PartsBin clearThumbnailCache.
    PartsBin cacheAllThumbnails.
    Flaps replaceToolsFlap.

On 10/22/17 1:05 PM, Bernhard Pieber wrote:
Dear Squeakers,

I tried to parse an HTML file like this in a trunk image and ran into a MNU:
FileStream fileNamed: ’some.html’ do: [:stream | HtmlParser parse: stream]

In HtmlText>>#initialize the message #replaceHtmlCharRefs is sent. I suppose this method was once the image. Otherwise HtmlParser would never have worked. How can I find out, when it got lost? How would you do it?

Cheers,
Bernhard




Reply | Threaded
Open this post in threaded view
|

Re: HtmlParser MNU: ByteString>>replaceHtmlCharRefs

Bob Arning-2
In reply to this post by bpi

If you just want to replace it yourself, try this:


'From Squeak3.4alpha of ''11 November 2002'' [latest update: #5109] on 16 November 2002 at 8:06:43 pm'
"Change Set:        ISO8859
Date:            15 November 2002
Author:            Boris Gaertner

Jean-Marie Zajac pointed out that accented characters in ISO-8859-1 encoding are not displayed as expected. Scamper is not encoding-aware, but it translates ISO-8859-1 to the encoding that is used in Squeak. Unfortunately, due to a subtle bug the translation is done twice: first, the entire source is translated, later parsed entities are translated again. This change set drops the translation of parsed entites. To make it work, it adds the translation of character entity references (characters that are written in the form &#<integer>; or in the form &<character name>; see sections 5.3.1 and 5.3.2 of the HTML 4.0 specification.)

Jean-Marie tested a first version and found a new bug, later he tested a second version that is seemingly ok. With his test he helped me to understand where the real problem was burried. Thanks a lot! 

"
HtmlText methodsFor: 'private-initialization' stamp: 'BG 11/15/2002 21:40'
initialize: source0
    super initialize: source0.
    self text: source0 replaceHtmlCharRefs.
String methodsFor: 'internet' stamp: 'BG 11/15/2002 21:18'
replaceHtmlCharRefs

| pos ampIndex scIndex special specialValue outString outPos newOutPos |

outString ← String new: self size.
outPos ← 0.

pos ← 1.

[ pos <= self size ] whileTrue: [ 
"read up to the next ampersand"
ampIndex ← self indexOf: $& startingAt: pos ifAbsent: [0].

ampIndex = 0 ifTrue: [
pos = 1 ifTrue: [ ↑self ] ifFalse: [ ampIndex ← self size+1 ] ].

newOutPos ← outPos + ampIndex - pos.
outString
replaceFrom: outPos + 1
to: newOutPos
with: self
startingAt: pos.
outPos ← newOutPos.
pos ← ampIndex.

ampIndex <= self size ifTrue: [
"find the $;"
scIndex ← self indexOf: $; startingAt: ampIndex ifAbsent: [ self size + 1 ].

special ← self copyFrom: ampIndex+1 to: scIndex-1. 
specialValue ← HtmlEntity valueOfHtmlEntity: special. 

specialValue
ifNil: [
"not a recognized entity. wite it back"
                                 scIndex > self size ifTrue: [ scIndex ← self size ].

newOutPos ← outPos + scIndex - ampIndex + 1.
outString
replaceFrom: outPos+1
to: newOutPos
with: self
startingAt: ampIndex.
outPos ← newOutPos.]
ifNotNil: [
outPos ← outPos + 1.
outString at: outPos put: specialValue isoToSqueak.].

pos ← scIndex + 1. ]. ].


↑outString copyFrom: 1 to: outPos

On 10/22/17 1:05 PM, Bernhard Pieber wrote:
Dear Squeakers,

I tried to parse an HTML file like this in a trunk image and ran into a MNU:
FileStream fileNamed: ’some.html’ do: [:stream | HtmlParser parse: stream]

In HtmlText>>#initialize the message #replaceHtmlCharRefs is sent. I suppose this method was once the image. Otherwise HtmlParser would never have worked. How can I find out, when it got lost? How would you do it?

Cheers,
Bernhard




bpi
Reply | Threaded
Open this post in threaded view
|

Re: HtmlParser MNU: ByteString>>replaceHtmlCharRefs

bpi
Hi Bob,

Thanks for your answer. It helped me find the method in a 3.4 image. How did you find the ChangeSets?

Cheers,
Bernhard

> Am 22.10.2017 um 19:46 schrieb Bob Arning <[hidden email]>:
>
> If you just want to replace it yourself, try this:
>
>
> 'From Squeak3.4alpha of ''11 November 2002'' [latest update: #5109] on 16 November 2002 at 8:06:43 pm'
> "Change Set:        ISO8859
> Date:            15 November 2002
> Author:            Boris Gaertner
>
> Jean-Marie Zajac pointed out that accented characters in ISO-8859-1 encoding are not displayed as expected. Scamper is not encoding-aware, but it translates ISO-8859-1 to the encoding that is used in Squeak. Unfortunately, due to a subtle bug the translation is done twice: first, the entire source is translated, later parsed entities are translated again. This change set drops the translation of parsed entites. To make it work, it adds the translation of character entity references (characters that are written in the form &#<integer>; or in the form &<character name>; see sections 5.3.1 and 5.3.2 of the HTML 4.0 specification.)
>
> Jean-Marie tested a first version and found a new bug, later he tested a second version that is seemingly ok. With his test he helped me to understand where the real problem was burried. Thanks a lot!
>
> "
> HtmlText methodsFor: 'private-initialization' stamp: 'BG 11/15/2002 21:40'
> initialize: source0
>     super initialize: source0.
>     self text: source0 replaceHtmlCharRefs.
> String methodsFor: 'internet' stamp: 'BG 11/15/2002 21:18'
> replaceHtmlCharRefs
>
> | pos ampIndex scIndex special specialValue outString outPos newOutPos |
>
> outString ← String new: self size.
> outPos ← 0.
>
> pos ← 1.
>
> [ pos <= self size ] whileTrue: [
> "read up to the next ampersand"
> ampIndex ← self indexOf: $& startingAt: pos ifAbsent: [0].
>
> ampIndex = 0 ifTrue: [
> pos = 1 ifTrue: [ ↑self ] ifFalse: [ ampIndex ← self size+1 ] ].
>
> newOutPos ← outPos + ampIndex - pos.
> outString
> replaceFrom: outPos + 1
> to: newOutPos
> with: self
> startingAt: pos.
> outPos ← newOutPos.
> pos ← ampIndex.
>
> ampIndex <= self size ifTrue: [
> "find the $;"
> scIndex ← self indexOf: $; startingAt: ampIndex ifAbsent: [ self size + 1 ].
>
> special ← self copyFrom: ampIndex+1 to: scIndex-1.
> specialValue ← HtmlEntity valueOfHtmlEntity: special.
>
> specialValue
> ifNil: [
> "not a recognized entity. wite it back"
>                                  scIndex > self size ifTrue: [ scIndex ← self size ].
>
> newOutPos ← outPos + scIndex - ampIndex + 1.
> outString
> replaceFrom: outPos+1
> to: newOutPos
> with: self
> startingAt: ampIndex.
> outPos ← newOutPos.]
> ifNotNil: [
> outPos ← outPos + 1.
> outString at: outPos put: specialValue isoToSqueak.].
>
> pos ← scIndex + 1. ]. ].
>
>
> ↑outString copyFrom: 1 to: outPos
>
> On 10/22/17 1:05 PM, Bernhard Pieber wrote:
>> Dear Squeakers,
>>
>> I tried to parse an HTML file like this in a trunk image and ran into a MNU:
>> FileStream fileNamed: ’some.html’ do: [:stream | HtmlParser parse: stream]
>>
>> In HtmlText>>#initialize the message #replaceHtmlCharRefs is sent. I suppose this method was once the image. Otherwise HtmlParser would never have worked. How can I find out, when it got lost? How would you do it?
>>
>> Cheers,
>> Bernhard
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Squeak update stream history and old changesets (was: HtmlParser MNU: ByteString>>replaceHtmlCharRefs}

David T. Lewis
Andreas Raab assembled the historical update stream (change sets) and saved it on
the server at http://files.squeak.org/history/

There is also the http://files.squeak.org/updates/ folder, although I think that
the ./history folder is the most complete collection available.

Later updates were done with the Monticello update stream that is currently in use.

Dave



On Sun, Oct 22, 2017 at 09:42:14PM +0200, Bernhard Pieber wrote:

> Hi Bob,
>
> Thanks for your answer. It helped me find the method in a 3.4 image. How did you find the ChangeSets?
>
> Cheers,
> Bernhard
>
> > Am 22.10.2017 um 19:46 schrieb Bob Arning <[hidden email]>:
> >
> > If you just want to replace it yourself, try this:
> >
> >
> > 'From Squeak3.4alpha of ''11 November 2002'' [latest update: #5109] on 16 November 2002 at 8:06:43 pm'
> > "Change Set:        ISO8859
> > Date:            15 November 2002
> > Author:            Boris Gaertner
> >
> > Jean-Marie Zajac pointed out that accented characters in ISO-8859-1 encoding are not displayed as expected. Scamper is not encoding-aware, but it translates ISO-8859-1 to the encoding that is used in Squeak. Unfortunately, due to a subtle bug the translation is done twice: first, the entire source is translated, later parsed entities are translated again. This change set drops the translation of parsed entites. To make it work, it adds the translation of character entity references (characters that are written in the form &#<integer>; or in the form &<character name>; see sections 5.3.1 and 5.3.2 of the HTML 4.0 specification.)
> >
> > Jean-Marie tested a first version and found a new bug, later he tested a second version that is seemingly ok. With his test he helped me to understand where the real problem was burried. Thanks a lot!
> >
> > "
> > HtmlText methodsFor: 'private-initialization' stamp: 'BG 11/15/2002 21:40'
> > initialize: source0
> >     super initialize: source0.
> >     self text: source0 replaceHtmlCharRefs.
> > String methodsFor: 'internet' stamp: 'BG 11/15/2002 21:18'
> > replaceHtmlCharRefs
> >
> > | pos ampIndex scIndex special specialValue outString outPos newOutPos |
> >
> > outString ??? String new: self size.
> > outPos ??? 0.
> >
> > pos ??? 1.
> >
> > [ pos <= self size ] whileTrue: [
> > "read up to the next ampersand"
> > ampIndex ??? self indexOf: $& startingAt: pos ifAbsent: [0].
> >
> > ampIndex = 0 ifTrue: [
> > pos = 1 ifTrue: [ ???self ] ifFalse: [ ampIndex ??? self size+1 ] ].
> >
> > newOutPos ??? outPos + ampIndex - pos.
> > outString
> > replaceFrom: outPos + 1
> > to: newOutPos
> > with: self
> > startingAt: pos.
> > outPos ??? newOutPos.
> > pos ??? ampIndex.
> >
> > ampIndex <= self size ifTrue: [
> > "find the $;"
> > scIndex ??? self indexOf: $; startingAt: ampIndex ifAbsent: [ self size + 1 ].
> >
> > special ??? self copyFrom: ampIndex+1 to: scIndex-1.
> > specialValue ??? HtmlEntity valueOfHtmlEntity: special.
> >
> > specialValue
> > ifNil: [
> > "not a recognized entity. wite it back"
> >                                  scIndex > self size ifTrue: [ scIndex ??? self size ].
> >
> > newOutPos ??? outPos + scIndex - ampIndex + 1.
> > outString
> > replaceFrom: outPos+1
> > to: newOutPos
> > with: self
> > startingAt: ampIndex.
> > outPos ??? newOutPos.]
> > ifNotNil: [
> > outPos ??? outPos + 1.
> > outString at: outPos put: specialValue isoToSqueak.].
> >
> > pos ??? scIndex + 1. ]. ].
> >
> >
> > ???outString copyFrom: 1 to: outPos
> >
> > On 10/22/17 1:05 PM, Bernhard Pieber wrote:
> >> Dear Squeakers,
> >>
> >> I tried to parse an HTML file like this in a trunk image and ran into a MNU:
> >> FileStream fileNamed: ???some.html??? do: [:stream | HtmlParser parse: stream]
> >>
> >> In HtmlText>>#initialize the message #replaceHtmlCharRefs is sent. I suppose this method was once the image. Otherwise HtmlParser would never have worked. How can I find out, when it got lost? How would you do it?
> >>
> >> Cheers,
> >> Bernhard
> >>
> >>
> >
> >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: HtmlParser MNU: ByteString>>replaceHtmlCharRefs

Bob Arning-2
In reply to this post by bpi

I wrote a change browser for as much of squeak history as I could find a few years ago.


On 10/22/17 3:42 PM, Bernhard Pieber wrote:
Thanks for your answer. It helped me find the method in a 3.4 image. How did you find the ChangeSets?



bpi
Reply | Threaded
Open this post in threaded view
|

Re: HtmlParser MNU: ByteString>>replaceHtmlCharRefs

bpi
Fascinating! Is this something you could maybe share? Or have you already? A quick mailing list and Google search turned up nothing.

Cheers,
Bernhard

> Am 22.10.2017 um 22:10 schrieb Bob Arning <[hidden email]>:
>
> I wrote a change browser for as much of squeak history as I could find a few years ago.
>
> On 10/22/17 3:42 PM, Bernhard Pieber wrote:
>> Thanks for your answer. It helped me find the method in a 3.4 image. How did you find the ChangeSets?

Reply | Threaded
Open this post in threaded view
|

Re: HtmlParser MNU: ByteString>>replaceHtmlCharRefs

Bob Arning-2

It was available for a while in 2013, but interest waned. You can search messages to the squeak list in 2013 with 'changes', 'evolution' or 'archaeology' in the title.

A (somewhat dated) code snapshot is at  https://www.dropbox.com/l/scl/AAD3x0OLu3lxYlc0IhiKNXKQzRLCrMZvqtA


On 10/22/17 4:16 PM, Bernhard Pieber wrote:
Fascinating! Is this something you could maybe share? Or have you already? A quick mailing list and Google search turned up nothing.

Cheers,
Bernhard

Am 22.10.2017 um 22:10 schrieb Bob Arning [hidden email]:

I wrote a change browser for as much of squeak history as I could find a few years ago.

On 10/22/17 3:42 PM, Bernhard Pieber wrote:
Thanks for your answer. It helped me find the method in a 3.4 image. How did you find the ChangeSets?

    



bpi
Reply | Threaded
Open this post in threaded view
|

Re: Squeak update stream history and old changesets (was: HtmlParser MNU: ByteString>>replaceHtmlCharRefs}

bpi
In reply to this post by David T. Lewis
That is a great resource! I downloaded all zip files and I could easily find a ChangeSet with the missing method. Good to know for the future.

Cheers,
Bernhard

> Am 22.10.2017 um 22:01 schrieb David T. Lewis <[hidden email]>:
>
> Andreas Raab assembled the historical update stream (change sets) and saved it on
> the server at http://files.squeak.org/history/
>
> There is also the http://files.squeak.org/updates/ folder, although I think that
> the ./history folder is the most complete collection available.
>
> Later updates were done with the Monticello update stream that is currently in use.
>
> Dave
>
>
>
> On Sun, Oct 22, 2017 at 09:42:14PM +0200, Bernhard Pieber wrote:
>> Hi Bob,
>>
>> Thanks for your answer. It helped me find the method in a 3.4 image. How did you find the ChangeSets?
>>
>> Cheers,
>> Bernhard
>>
>>> Am 22.10.2017 um 19:46 schrieb Bob Arning <[hidden email]>:
>>>
>>> If you just want to replace it yourself, try this:
>>>
>>>
>>> 'From Squeak3.4alpha of ''11 November 2002'' [latest update: #5109] on 16 November 2002 at 8:06:43 pm'
>>> "Change Set:        ISO8859
>>> Date:            15 November 2002
>>> Author:            Boris Gaertner
>>>
>>> Jean-Marie Zajac pointed out that accented characters in ISO-8859-1 encoding are not displayed as expected. Scamper is not encoding-aware, but it translates ISO-8859-1 to the encoding that is used in Squeak. Unfortunately, due to a subtle bug the translation is done twice: first, the entire source is translated, later parsed entities are translated again. This change set drops the translation of parsed entites. To make it work, it adds the translation of character entity references (characters that are written in the form &#<integer>; or in the form &<character name>; see sections 5.3.1 and 5.3.2 of the HTML 4.0 specification.)
>>>
>>> Jean-Marie tested a first version and found a new bug, later he tested a second version that is seemingly ok. With his test he helped me to understand where the real problem was burried. Thanks a lot!
>>>
>>> "
>>> HtmlText methodsFor: 'private-initialization' stamp: 'BG 11/15/2002 21:40'
>>> initialize: source0
>>>    super initialize: source0.
>>>    self text: source0 replaceHtmlCharRefs.
>>> String methodsFor: 'internet' stamp: 'BG 11/15/2002 21:18'
>>> replaceHtmlCharRefs
>>>
>>> | pos ampIndex scIndex special specialValue outString outPos newOutPos |
>>>
>>> outString ??? String new: self size.
>>> outPos ??? 0.
>>>
>>> pos ??? 1.
>>>
>>> [ pos <= self size ] whileTrue: [
>>> "read up to the next ampersand"
>>> ampIndex ??? self indexOf: $& startingAt: pos ifAbsent: [0].
>>>
>>> ampIndex = 0 ifTrue: [
>>> pos = 1 ifTrue: [ ???self ] ifFalse: [ ampIndex ??? self size+1 ] ].
>>>
>>> newOutPos ??? outPos + ampIndex - pos.
>>> outString
>>> replaceFrom: outPos + 1
>>> to: newOutPos
>>> with: self
>>> startingAt: pos.
>>> outPos ??? newOutPos.
>>> pos ??? ampIndex.
>>>
>>> ampIndex <= self size ifTrue: [
>>> "find the $;"
>>> scIndex ??? self indexOf: $; startingAt: ampIndex ifAbsent: [ self size + 1 ].
>>>
>>> special ??? self copyFrom: ampIndex+1 to: scIndex-1.
>>> specialValue ??? HtmlEntity valueOfHtmlEntity: special.
>>>
>>> specialValue
>>> ifNil: [
>>> "not a recognized entity. wite it back"
>>>                                 scIndex > self size ifTrue: [ scIndex ??? self size ].
>>>
>>> newOutPos ??? outPos + scIndex - ampIndex + 1.
>>> outString
>>> replaceFrom: outPos+1
>>> to: newOutPos
>>> with: self
>>> startingAt: ampIndex.
>>> outPos ??? newOutPos.]
>>> ifNotNil: [
>>> outPos ??? outPos + 1.
>>> outString at: outPos put: specialValue isoToSqueak.].
>>>
>>> pos ??? scIndex + 1. ]. ].
>>>
>>>
>>> ???outString copyFrom: 1 to: outPos
>>>
>>> On 10/22/17 1:05 PM, Bernhard Pieber wrote:
>>>> Dear Squeakers,
>>>>
>>>> I tried to parse an HTML file like this in a trunk image and ran into a MNU:
>>>> FileStream fileNamed: ???some.html??? do: [:stream | HtmlParser parse: stream]
>>>>
>>>> In HtmlText>>#initialize the message #replaceHtmlCharRefs is sent. I suppose this method was once the image. Otherwise HtmlParser would never have worked. How can I find out, when it got lost? How would you do it?
>>>>
>>>> Cheers,
>>>> Bernhard
>>>>
>>>>
>>>
>>>
>>
>>
>