XMLHTMLParser Entity Handling oddity

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

XMLHTMLParser Entity Handling oddity

Udo Schneider
All,

I'm hitting an interesting issue with XMLHTMLParser and I'm not even
sure if this is a bug or intended behaviour. Given an HTML Entity in a
String it's resolved or quoted depending on the tag (header or section tag):

doc := XMLHTMLParser parse:
'<html><head><title>&Uuml;</title></head><body>&Uuml;</body></html>'.
(doc findElementNamed: 'title') contentString. "'&Uuml;'"
(doc findElementNamed: 'body') contentString.  "'Ü'"

In my understanding and according to
https://www.w3.org/TR/html401/struct/global.html#h-7.4.2 Entities in the
title tag are allowed and should IMHO be resolved.

So both should return 'Ü' in this case.

Any pointers?

CU,

Udo


Reply | Threaded
Open this post in threaded view
|

Re: XMLHTMLParser Entity Handling oddity

monty-3
This should be fixed now. Thanks for the bug report.

> Sent: Wednesday, May 03, 2017 at 4:44 PM
> From: "Udo Schneider" <[hidden email]>
> To: [hidden email]
> Subject: [Pharo-users] XMLHTMLParser Entity Handling oddity
>
> All,
>
> I'm hitting an interesting issue with XMLHTMLParser and I'm not even
> sure if this is a bug or intended behaviour. Given an HTML Entity in a
> String it's resolved or quoted depending on the tag (header or section tag):
>
> doc := XMLHTMLParser parse:
> '<html><head><title>Ü</title></head><body>Ü</body></html>'.
> (doc findElementNamed: 'title') contentString. "'Ü'"
> (doc findElementNamed: 'body') contentString.  "'Ü'"
>
> In my understanding and according to
> https://www.w3.org/TR/html401/struct/global.html#h-7.4.2 Entities in the
> title tag are allowed and should IMHO be resolved.
>
> So both should return 'Ü' in this case.
>
> Any pointers?
>
> CU,
>
> Udo
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: XMLHTMLParser Entity Handling oddity

Stephane Ducasse-3
Tx monty for the fix!

On Fri, May 5, 2017 at 7:28 PM, monty <[hidden email]> wrote:
This should be fixed now. Thanks for the bug report.

> Sent: Wednesday, May 03, 2017 at 4:44 PM
> From: "Udo Schneider" <[hidden email]>
> To: [hidden email]
> Subject: [Pharo-users] XMLHTMLParser Entity Handling oddity
>
> All,
>
> I'm hitting an interesting issue with XMLHTMLParser and I'm not even
> sure if this is a bug or intended behaviour. Given an HTML Entity in a
> String it's resolved or quoted depending on the tag (header or section tag):
>
> doc := XMLHTMLParser parse:
> '<html><head><title>Ü</title></head><body>Ü</body></html>'.
> (doc findElementNamed: 'title') contentString. "'Ü'"
> (doc findElementNamed: 'body') contentString.  "'Ü'"
>
> In my understanding and according to
> https://www.w3.org/TR/html401/struct/global.html#h-7.4.2 Entities in the
> title tag are allowed and should IMHO be resolved.
>
> So both should return 'Ü' in this case.
>
> Any pointers?
>
> CU,
>
> Udo
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: XMLHTMLParser Entity Handling oddity

Udo Schneider
In reply to this post by monty-3
Perfect! Thank you very very much!

Am 05/05/17 um 19:28 schrieb monty:

> This should be fixed now. Thanks for the bug report.
>
>> Sent: Wednesday, May 03, 2017 at 4:44 PM
>> From: "Udo Schneider" <[hidden email]>
>> To: [hidden email]
>> Subject: [Pharo-users] XMLHTMLParser Entity Handling oddity
>>
>> All,
>>
>> I'm hitting an interesting issue with XMLHTMLParser and I'm not even
>> sure if this is a bug or intended behaviour. Given an HTML Entity in a
>> String it's resolved or quoted depending on the tag (header or section tag):
>>
>> doc := XMLHTMLParser parse:
>> '<html><head><title>Ü</title></head><body>Ü</body></html>'.
>> (doc findElementNamed: 'title') contentString. "'Ü'"
>> (doc findElementNamed: 'body') contentString.  "'Ü'"
>>
>> In my understanding and according to
>> https://www.w3.org/TR/html401/struct/global.html#h-7.4.2 Entities in the
>> title tag are allowed and should IMHO be resolved.
>>
>> So both should return 'Ü' in this case.
>>
>> Any pointers?
>>
>> CU,
>>
>> Udo
>>
>>
>>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: XMLHTMLParser Entity Handling oddity

Stephane Ducasse-3
Hi guys

It would be supercool to have a chapter on the XML package. 
Does any of you have the knowledge to do it?
I do not have it. 

Stef


On Sat, May 6, 2017 at 9:51 AM, Udo Schneider <[hidden email]> wrote:
Perfect! Thank you very very much!

Am 05/05/17 um 19:28 schrieb monty:

This should be fixed now. Thanks for the bug report.

Sent: Wednesday, May 03, 2017 at 4:44 PM
From: "Udo Schneider" <[hidden email]>
To: [hidden email]
Subject: [Pharo-users] XMLHTMLParser Entity Handling oddity

All,

I'm hitting an interesting issue with XMLHTMLParser and I'm not even
sure if this is a bug or intended behaviour. Given an HTML Entity in a
String it's resolved or quoted depending on the tag (header or section tag):

doc := XMLHTMLParser parse:
'<html><head><title>Ü</title></head><body>Ü</body></html>'.
(doc findElementNamed: 'title') contentString. "'Ü'"
(doc findElementNamed: 'body') contentString.  "'Ü'"

In my understanding and according to
https://www.w3.org/TR/html401/struct/global.html#h-7.4.2 Entities in the
title tag are allowed and should IMHO be resolved.

So both should return 'Ü' in this case.

Any pointers?

CU,

Udo









Reply | Threaded
Open this post in threaded view
|

Re: XMLHTMLParser Entity Handling oddity

monty-3
Yes, but at this point it will probably be a booklet, like the Glorp and Smacc ones you posted.

> Sent: Saturday, May 06, 2017 at 6:19 AM
> From: "Stephane Ducasse" <[hidden email]>
> To: "Any question about pharo is welcome" <[hidden email]>
> Subject: Re: [Pharo-users] XMLHTMLParser Entity Handling oddity
>
> Hi guys
>  
> It would be supercool to have a chapter on the XML package.
> Does any of you have the knowledge to do it?
> I do not have it.
>  
> Stef
>  
>  
> On Sat, May 6, 2017 at 9:51 AM, Udo Schneider <[hidden email][mailto:[hidden email]]> wrote:Perfect! Thank you very very much!
>
> Am 05/05/17 um 19:28 schrieb monty:
>
>  This should be fixed now. Thanks for the bug report.
>  Sent: Wednesday, May 03, 2017 at 4:44 PM
> From: "Udo Schneider" <[hidden email][mailto:[hidden email]]>
> To: [hidden email][mailto:[hidden email]]
> Subject: [Pharo-users] XMLHTMLParser Entity Handling oddity
>
> All,
>
> I'm hitting an interesting issue with XMLHTMLParser and I'm not even
> sure if this is a bug or intended behaviour. Given an HTML Entity in a
> String it's resolved or quoted depending on the tag (header or section tag):
>
> doc := XMLHTMLParser parse:
> '<html><head><title>Ü</title></head><body>Ü</body></html>'.
> (doc findElementNamed: 'title') contentString. "'Ü'"
> (doc findElementNamed: 'body') contentString.  "'Ü'"
>
> In my understanding and according to
> https://www.w3.org/TR/html401/struct/global.html#h-7.4.2[https://www.w3.org/TR/html401/struct/global.html#h-7.4.2] Entities in the
> title tag are allowed and should IMHO be resolved.
>
> So both should return 'Ü' in this case.
>
> Any pointers?
>
> CU,
>
> Udo

Reply | Threaded
Open this post in threaded view
|

Re: XMLHTMLParser Entity Handling oddity

Stephane Ducasse-3
Hi monty 

yes I would love to have a booklet and I can help reading and reviewing and producing it. 
Tell me that you want. 
What you can do also is to start by writing little (2/3) page blok posts and we turn them into a booklet. 

Stef

On Sun, May 7, 2017 at 1:37 AM, monty <[hidden email]> wrote:
Yes, but at this point it will probably be a booklet, like the Glorp and Smacc ones you posted.

> Sent: Saturday, May 06, 2017 at 6:19 AM
> From: "Stephane Ducasse" <[hidden email]>
> To: "Any question about pharo is welcome" <[hidden email]>
> Subject: Re: [Pharo-users] XMLHTMLParser Entity Handling oddity
>
> Hi guys
>
> It would be supercool to have a chapter on the XML package.
> Does any of you have the knowledge to do it?
> I do not have it.
>
> Stef
>
>
> On Sat, May 6, 2017 at 9:51 AM, Udo Schneider <[hidden email][mailto:[hidden email]]> wrote:Perfect! Thank you very very much!
>
> Am 05/05/17 um 19:28 schrieb monty:
>
>  This should be fixed now. Thanks for the bug report.
>  Sent: Wednesday, May 03, 2017 at 4:44 PM
> From: "Udo Schneider" <[hidden email][mailto:[hidden email]]>
> To: [hidden email][mailto:[hidden email]]
> Subject: [Pharo-users] XMLHTMLParser Entity Handling oddity
>
> All,
>
> I'm hitting an interesting issue with XMLHTMLParser and I'm not even
> sure if this is a bug or intended behaviour. Given an HTML Entity in a
> String it's resolved or quoted depending on the tag (header or section tag):
>
> doc := XMLHTMLParser parse:
> '<html><head><title>Ü</title></head><body>Ü</body></html>'.
> (doc findElementNamed: 'title') contentString. "'Ü'"
> (doc findElementNamed: 'body') contentString.  "'Ü'"
>
> In my understanding and according to
> <a href="https://www.w3.org/TR/html401/struct/global.html#h-7.4.2[https://www.w3.org/TR/html401/struct/global.html#h-7.4.2]" rel="noreferrer" target="_blank">https://www.w3.org/TR/html401/struct/global.html#h-7.4.2[https://www.w3.org/TR/html401/struct/global.html#h-7.4.2] Entities in the
> title tag are allowed and should IMHO be resolved.
>
> So both should return 'Ü' in this case.
>
> Any pointers?
>
> CU,
>
> Udo