Error parsing XML File

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Error parsing XML File

Fabrizio Perin-3
Hi,
I was parsing an XML File with the last version of XML Parser (XML-Parser-JAAyer.68) and i get an error related to a not UTF-8 character that the parser found into the document. The XML document contains some german character:

<![CDATA[SES: Bean BLABLABLA Eidgenössisches Institut für BLABLALBLA]]>

Actually i'm not sure if the error is which is in the UTF8TextConverter or something is wrong in the invokation from the parser. Anyway i parse several time the same document with older versions of the XML-Parser (XML-Parser-JAAyer.57) and it always works well. I'm not sure if the mailing list of Pharo is the right place to report this problem in the case i'm i'm sorry. 

Here the trace from the log: 

Error: Invalid utf8 input detected
26 March 2010 4:14:07 pm

VM: Mac OS - intel - 1062 - Squeak3.8.1 of '28 Aug 2006' [latest update: #6747] Squeak VM 4.2.2b1
Image: Pharo-1.0-10515-rc3 [Latest update: #10515]

SecurityManager state:
Restricted: false
FileAccess: true
SocketAccess: true
Working Dir /Users/fabrizioperin/development/Pharo/WORKINGONNOW/MooseJEE_64
Trusted Dir /foobar/tooBar/forSqueak/bogus
Untrusted Dir /Users/fabrizioperin/Library/Preferences/Squeak/Internet/My Squeak

UTF8TextConverter(Object)>>error:
Receiver: an UTF8TextConverter
Arguments and temporary variables: 
aString: 'Invalid utf8 input detected'
Receiver's instance variables: 
an UTF8TextConverter

UTF8TextConverter>>errorMalformedInput
Receiver: an UTF8TextConverter
Arguments and temporary variables: 

Receiver's instance variables: 
an UTF8TextConverter

UTF8TextConverter>>nextFromStream:
Receiver: an UTF8TextConverter
Arguments and temporary variables: 
aStream: MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGON...etc...
character1:
value1: 182
character2: $s
value2: 115
unicode: nil
character3: $s
value3: 115
character4: nil
value4: nil
Receiver's instance variables: 
an UTF8TextConverter

MultiByteFileStream>>next
Receiver: MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONNOW/MooseJEE_64/src/...etc...
Arguments and temporary variables: 
char: nil
secondChar: nil
state: nil
Receiver's instance variables: 


XMLStreamReader>>basicNext
Receiver: a XMLStreamReader
Arguments and temporary variables: 
nextChar: nil
Receiver's instance variables: 
stream: MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONN...etc...
nestedStreams: nil
peekChar: nil
buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der Stako relevanten ...etc...

XMLStreamReader>>next
Receiver: a XMLStreamReader
Arguments and temporary variables: 
nextChar: nil
Receiver's instance variables: 
stream: MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONN...etc...
nestedStreams: nil
peekChar: nil
buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der Stako relevanten ...etc...

XMLStreamReader>>upToAll:
Receiver: a XMLStreamReader
Arguments and temporary variables: 
aDelimitingString: ']]>'
Receiver's instance variables: 
stream: MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONN...etc...
nestedStreams: nil
peekChar: nil
buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der Stako relevanten ...etc...

SAXDriver(XMLTokenizer)>>nextCDataContent
Receiver: a SAXDriver
Arguments and temporary variables: 
cdata: nil
Receiver's instance variables: 
streamReader: a XMLStreamReader
streamWriter: a XMLStreamWriter
entities: nil
externalEntities: nil
parameterEntities: nil
isValidating: false
parsingMarkup: false
saxHandler: an OPOpaxHandler
openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
nestedScopes: nil
useNamespaces: false
validateAttributes: nil
languageEnvironment: nil

SAXDriver(XMLTokenizer)>>nextCDataOrConditional
Receiver: a SAXDriver
Arguments and temporary variables: 
nextChar: $C
conditionalKeyword: nil
Receiver's instance variables: 
streamReader: a XMLStreamReader
streamWriter: a XMLStreamWriter
entities: nil
externalEntities: nil
parameterEntities: nil
isValidating: false
parsingMarkup: false
saxHandler: an OPOpaxHandler
openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
nestedScopes: nil
useNamespaces: false
validateAttributes: nil
languageEnvironment: nil

SAXDriver(XMLTokenizer)>>nextMarkupToken
Receiver: a SAXDriver
Arguments and temporary variables: 
nextChar: $[
Receiver's instance variables: 
streamReader: a XMLStreamReader
streamWriter: a XMLStreamWriter
entities: nil
externalEntities: nil
parameterEntities: nil
isValidating: false
parsingMarkup: false
saxHandler: an OPOpaxHandler
openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
nestedScopes: nil
useNamespaces: false
validateAttributes: nil
languageEnvironment: nil

SAXDriver(XMLTokenizer)>>nextToken
Receiver: a SAXDriver
Arguments and temporary variables: 
whitespace: ''
Receiver's instance variables: 
streamReader: a XMLStreamReader
streamWriter: a XMLStreamWriter
entities: nil
externalEntities: nil
parameterEntities: nil
isValidating: false
parsingMarkup: false
saxHandler: an OPOpaxHandler
openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
nestedScopes: nil
useNamespaces: false
validateAttributes: nil
languageEnvironment: nil

OPOpaxHandler(SAXHandler)>>parseDocument
Receiver: an OPOpaxHandler
Arguments and temporary variables: 

Receiver's instance variables: 
driver: a SAXDriver
eod: false
stack: an OrderedCollection(<?xml version="1.0" encoding="utf-8"?>
<ejb-jar id=...etc...

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Error parsing XML File

Alexandre Bergel
Hi Fabrizo,

I think you're in the right place to talk about that.

I haven't been able to reproduce your error.
I added a test:

XMLParserTest>>testNonUTF8Characters

        self shouldnt: [XMLDOMParser parseDocumentFrom:
                '<foo>Bean BLABLABLA Eidgenössisches Institut für BLABLALBLA</foo>'  
readStream] raise: Error.

It goes green in my image. Do you have a different way to get the  
readStream from the String?

Cheers,
Alexandre

On 26 Mar 2010, at 12:14, Fabrizio Perin wrote:

> Hi,
> I was parsing an XML File with the last version of XML Parser (XML-
> Parser-JAAyer.68) and i get an error related to a not UTF-8  
> character that the parser found into the document. The XML document  
> contains some german character:
>
> <![CDATA[SES: Bean BLABLABLA Eidgenössisches Institut für  
> BLABLALBLA]]>
>
> Actually i'm not sure if the error is which is in the  
> UTF8TextConverter or something is wrong in the invokation from the  
> parser. Anyway i parse several time the same document with older  
> versions of the XML-Parser (XML-Parser-JAAyer.57) and it always  
> works well. I'm not sure if the mailing list of Pharo is the right  
> place to report this problem in the case i'm i'm sorry.
>
> Here the trace from the log:
>
> Error: Invalid utf8 input detected
> 26 March 2010 4:14:07 pm
>
> VM: Mac OS - intel - 1062 - Squeak3.8.1 of '28 Aug 2006' [latest  
> update: #6747] Squeak VM 4.2.2b1
> Image: Pharo-1.0-10515-rc3 [Latest update: #10515]
>
> SecurityManager state:
> Restricted: false
> FileAccess: true
> SocketAccess: true
> Working Dir /Users/fabrizioperin/development/Pharo/WORKINGONNOW/
> MooseJEE_64
> Trusted Dir /foobar/tooBar/forSqueak/bogus
> Untrusted Dir /Users/fabrizioperin/Library/Preferences/Squeak/
> Internet/My Squeak
>
> UTF8TextConverter(Object)>>error:
> Receiver: an UTF8TextConverter
> Arguments and temporary variables:
> aString: 'Invalid utf8 input detected'
> Receiver's instance variables:
> an UTF8TextConverter
>
> UTF8TextConverter>>errorMalformedInput
> Receiver: an UTF8TextConverter
> Arguments and temporary variables:
>
> Receiver's instance variables:
> an UTF8TextConverter
>
> UTF8TextConverter>>nextFromStream:
> Receiver: an UTF8TextConverter
> Arguments and temporary variables:
> aStream: MultiByteFileStream: '/Users/fabrizioperin/development/
> Pharo/WORKINGON...etc...
> character1: $¶
> value1: 182
> character2: $s
> value2: 115
> unicode: nil
> character3: $s
> value3: 115
> character4: nil
> value4: nil
> Receiver's instance variables:
> an UTF8TextConverter
>
> MultiByteFileStream>>next
> Receiver: MultiByteFileStream: '/Users/fabrizioperin/development/
> Pharo/WORKINGONNOW/MooseJEE_64/src/...etc...
> Arguments and temporary variables:
> char: nil
> secondChar: nil
> state: nil
> Receiver's instance variables:
>
>
> XMLStreamReader>>basicNext
> Receiver: a XMLStreamReader
> Arguments and temporary variables:
> nextChar: nil
> Receiver's instance variables:
> stream: MultiByteFileStream: '/Users/fabrizioperin/development/
> Pharo/WORKINGONN...etc...
> nestedStreams: nil
> peekChar: nil
> buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der  
> Stako relevanten ...etc...
>
> XMLStreamReader>>next
> Receiver: a XMLStreamReader
> Arguments and temporary variables:
> nextChar: nil
> Receiver's instance variables:
> stream: MultiByteFileStream: '/Users/fabrizioperin/development/
> Pharo/WORKINGONN...etc...
> nestedStreams: nil
> peekChar: nil
> buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der  
> Stako relevanten ...etc...
>
> XMLStreamReader>>upToAll:
> Receiver: a XMLStreamReader
> Arguments and temporary variables:
> aDelimitingString: ']]>'
> Receiver's instance variables:
> stream: MultiByteFileStream: '/Users/fabrizioperin/development/
> Pharo/WORKINGONN...etc...
> nestedStreams: nil
> peekChar: nil
> buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der  
> Stako relevanten ...etc...
>
> SAXDriver(XMLTokenizer)>>nextCDataContent
> Receiver: a SAXDriver
> Arguments and temporary variables:
> cdata: nil
> Receiver's instance variables:
> streamReader: a XMLStreamReader
> streamWriter: a XMLStreamWriter
> entities: nil
> externalEntities: nil
> parameterEntities: nil
> isValidating: false
> parsingMarkup: false
> saxHandler: an OPOpaxHandler
> openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
> nestedScopes: nil
> useNamespaces: false
> validateAttributes: nil
> languageEnvironment: nil
>
> SAXDriver(XMLTokenizer)>>nextCDataOrConditional
> Receiver: a SAXDriver
> Arguments and temporary variables:
> nextChar: $C
> conditionalKeyword: nil
> Receiver's instance variables:
> streamReader: a XMLStreamReader
> streamWriter: a XMLStreamWriter
> entities: nil
> externalEntities: nil
> parameterEntities: nil
> isValidating: false
> parsingMarkup: false
> saxHandler: an OPOpaxHandler
> openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
> nestedScopes: nil
> useNamespaces: false
> validateAttributes: nil
> languageEnvironment: nil
>
> SAXDriver(XMLTokenizer)>>nextMarkupToken
> Receiver: a SAXDriver
> Arguments and temporary variables:
> nextChar: $[
> Receiver's instance variables:
> streamReader: a XMLStreamReader
> streamWriter: a XMLStreamWriter
> entities: nil
> externalEntities: nil
> parameterEntities: nil
> isValidating: false
> parsingMarkup: false
> saxHandler: an OPOpaxHandler
> openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
> nestedScopes: nil
> useNamespaces: false
> validateAttributes: nil
> languageEnvironment: nil
>
> SAXDriver(XMLTokenizer)>>nextToken
> Receiver: a SAXDriver
> Arguments and temporary variables:
> whitespace: ''
> Receiver's instance variables:
> streamReader: a XMLStreamReader
> streamWriter: a XMLStreamWriter
> entities: nil
> externalEntities: nil
> parameterEntities: nil
> isValidating: false
> parsingMarkup: false
> saxHandler: an OPOpaxHandler
> openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
> nestedScopes: nil
> useNamespaces: false
> validateAttributes: nil
> languageEnvironment: nil
>
> OPOpaxHandler(SAXHandler)>>parseDocument
> Receiver: an OPOpaxHandler
> Arguments and temporary variables:
>
> Receiver's instance variables:
> driver: a SAXDriver
> eod: false
> stack: an OrderedCollection(<?xml version="1.0" encoding="utf-8"?>
> <ejb-jar id=...etc...
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Error parsing XML File

Fabrizio Perin-3
Hi Alex,
thanks for your effort. Actually the problem was related exactly to the readStream, my method to import the XML files uses the class FileStream, instead now it use StandardReadStream and everything works fine. All Tests where green in my image too (including your new test) but the error still raises trying to import from a file. So i investigate in the direction of the readStream from a file and i found the solution. I'm still not sure which is the problem using FileStream instead StandardFileStream.

Thanks a lot,

Fabrizio

2010/3/26 Alexandre Bergel <[hidden email]>
Hi Fabrizo,

I think you're in the right place to talk about that.

I haven't been able to reproduce your error.
I added a test:

XMLParserTest>>testNonUTF8Characters

       self shouldnt: [XMLDOMParser parseDocumentFrom:
               '<foo>Bean BLABLABLA Eidgenössisches Institut für BLABLALBLA</foo>' readStream] raise: Error.

It goes green in my image. Do you have a different way to get the readStream from the String?

Cheers,
Alexandre


On 26 Mar 2010, at 12:14, Fabrizio Perin wrote:

Hi,
I was parsing an XML File with the last version of XML Parser (XML-Parser-JAAyer.68) and i get an error related to a not UTF-8 character that the parser found into the document. The XML document contains some german character:

<![CDATA[SES: Bean BLABLABLA Eidgenössisches Institut für BLABLALBLA]]>

Actually i'm not sure if the error is which is in the UTF8TextConverter or something is wrong in the invokation from the parser. Anyway i parse several time the same document with older versions of the XML-Parser (XML-Parser-JAAyer.57) and it always works well. I'm not sure if the mailing list of Pharo is the right place to report this problem in the case i'm i'm sorry.

Here the trace from the log:

Error: Invalid utf8 input detected
26 March 2010 4:14:07 pm

VM: Mac OS - intel - 1062 - Squeak3.8.1 of '28 Aug 2006' [latest update: #6747] Squeak VM 4.2.2b1
Image: Pharo-1.0-10515-rc3 [Latest update: #10515]

SecurityManager state:
Restricted: false
FileAccess: true
SocketAccess: true
Working Dir /Users/fabrizioperin/development/Pharo/WORKINGONNOW/MooseJEE_64
Trusted Dir /foobar/tooBar/forSqueak/bogus
Untrusted Dir /Users/fabrizioperin/Library/Preferences/Squeak/Internet/My Squeak

UTF8TextConverter(Object)>>error:
       Receiver: an UTF8TextConverter
       Arguments and temporary variables:
               aString:        'Invalid utf8 input detected'
       Receiver's instance variables:
an UTF8TextConverter

UTF8TextConverter>>errorMalformedInput
       Receiver: an UTF8TextConverter
       Arguments and temporary variables:

       Receiver's instance variables:
an UTF8TextConverter

UTF8TextConverter>>nextFromStream:
       Receiver: an UTF8TextConverter
       Arguments and temporary variables:
               aStream:        MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGON...etc...
               character1:     $¶
               value1:         182
               character2:     $s
               value2:         115
               unicode:        nil
               character3:     $s
               value3:         115
               character4:     nil
               value4:         nil
       Receiver's instance variables:
an UTF8TextConverter

MultiByteFileStream>>next
       Receiver: MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONNOW/MooseJEE_64/src/...etc...
       Arguments and temporary variables:
               char:   nil
               secondChar:     nil
               state:  nil
       Receiver's instance variables:


XMLStreamReader>>basicNext
       Receiver: a XMLStreamReader
       Arguments and temporary variables:
               nextChar:       nil
       Receiver's instance variables:
               stream:         MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONN...etc...
               nestedStreams:  nil
               peekChar:       nil
               buffer:         a WriteStream 'SES: Bean zum Einlesen und updaten der Stako relevanten ...etc...

XMLStreamReader>>next
       Receiver: a XMLStreamReader
       Arguments and temporary variables:
               nextChar:       nil
       Receiver's instance variables:
               stream:         MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONN...etc...
               nestedStreams:  nil
               peekChar:       nil
               buffer:         a WriteStream 'SES: Bean zum Einlesen und updaten der Stako relevanten ...etc...

XMLStreamReader>>upToAll:
       Receiver: a XMLStreamReader
       Arguments and temporary variables:
               aDelimitingString:      ']]>'
       Receiver's instance variables:
               stream:         MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONN...etc...
               nestedStreams:  nil
               peekChar:       nil
               buffer:         a WriteStream 'SES: Bean zum Einlesen und updaten der Stako relevanten ...etc...

SAXDriver(XMLTokenizer)>>nextCDataContent
       Receiver: a SAXDriver
       Arguments and temporary variables:
               cdata:  nil
       Receiver's instance variables:
               streamReader:   a XMLStreamReader
               streamWriter:   a XMLStreamWriter
               entities:       nil
               externalEntities:       nil
               parameterEntities:      nil
               isValidating:   false
               parsingMarkup:  false
               saxHandler:     an OPOpaxHandler
               openTags:       <ejb-jar>, <enterprise-beans>, <session>, <description>
               nestedScopes:   nil
               useNamespaces:  false
               validateAttributes:     nil
               languageEnvironment:    nil

SAXDriver(XMLTokenizer)>>nextCDataOrConditional
       Receiver: a SAXDriver
       Arguments and temporary variables:
               nextChar:       $C
               conditionalKeyword:     nil
       Receiver's instance variables:
               streamReader:   a XMLStreamReader
               streamWriter:   a XMLStreamWriter
               entities:       nil
               externalEntities:       nil
               parameterEntities:      nil
               isValidating:   false
               parsingMarkup:  false
               saxHandler:     an OPOpaxHandler
               openTags:       <ejb-jar>, <enterprise-beans>, <session>, <description>
               nestedScopes:   nil
               useNamespaces:  false
               validateAttributes:     nil
               languageEnvironment:    nil

SAXDriver(XMLTokenizer)>>nextMarkupToken
       Receiver: a SAXDriver
       Arguments and temporary variables:
               nextChar:       $[
       Receiver's instance variables:
               streamReader:   a XMLStreamReader
               streamWriter:   a XMLStreamWriter
               entities:       nil
               externalEntities:       nil
               parameterEntities:      nil
               isValidating:   false
               parsingMarkup:  false
               saxHandler:     an OPOpaxHandler
               openTags:       <ejb-jar>, <enterprise-beans>, <session>, <description>
               nestedScopes:   nil
               useNamespaces:  false
               validateAttributes:     nil
               languageEnvironment:    nil

SAXDriver(XMLTokenizer)>>nextToken
       Receiver: a SAXDriver
       Arguments and temporary variables:
               whitespace:     ''
       Receiver's instance variables:
               streamReader:   a XMLStreamReader
               streamWriter:   a XMLStreamWriter
               entities:       nil
               externalEntities:       nil
               parameterEntities:      nil
               isValidating:   false
               parsingMarkup:  false
               saxHandler:     an OPOpaxHandler
               openTags:       <ejb-jar>, <enterprise-beans>, <session>, <description>
               nestedScopes:   nil
               useNamespaces:  false
               validateAttributes:     nil
               languageEnvironment:    nil

OPOpaxHandler(SAXHandler)>>parseDocument
       Receiver: an OPOpaxHandler
       Arguments and temporary variables:

       Receiver's instance variables:
               driver:         a SAXDriver
               eod:    false
               stack:  an OrderedCollection(<?xml version="1.0" encoding="utf-8"?>
<ejb-jar id=...etc...
_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






_______________________________________________


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Error parsing XML File

Alexandre Bergel
Can you send me the file please?

Alexandre


On 27 Mar 2010, at 12:47, Fabrizio Perin wrote:

> Hi Alex,
> thanks for your effort. Actually the problem was related exactly to  
> the readStream, my method to import the XML files uses the class  
> FileStream, instead now it use StandardReadStream and everything  
> works fine. All Tests where green in my image too (including your  
> new test) but the error still raises trying to import from a file.  
> So i investigate in the direction of the readStream from a file and  
> i found the solution. I'm still not sure which is the problem using  
> FileStream instead StandardFileStream.
>
> Thanks a lot,
>
> Fabrizio
>
> 2010/3/26 Alexandre Bergel <[hidden email]>
> Hi Fabrizo,
>
> I think you're in the right place to talk about that.
>
> I haven't been able to reproduce your error.
> I added a test:
>
> XMLParserTest>>testNonUTF8Characters
>
>        self shouldnt: [XMLDOMParser parseDocumentFrom:
>                '<foo>Bean BLABLABLA Eidgenössisches Institut für  
> BLABLALBLA</foo>' readStream] raise: Error.
>
> It goes green in my image. Do you have a different way to get the  
> readStream from the String?
>
> Cheers,
> Alexandre
>
>
> On 26 Mar 2010, at 12:14, Fabrizio Perin wrote:
>
> Hi,
> I was parsing an XML File with the last version of XML Parser (XML-
> Parser-JAAyer.68) and i get an error related to a not UTF-8  
> character that the parser found into the document. The XML document  
> contains some german character:
>
> <![CDATA[SES: Bean BLABLABLA Eidgenössisches Institut für  
> BLABLALBLA]]>
>
> Actually i'm not sure if the error is which is in the  
> UTF8TextConverter or something is wrong in the invokation from the  
> parser. Anyway i parse several time the same document with older  
> versions of the XML-Parser (XML-Parser-JAAyer.57) and it always  
> works well. I'm not sure if the mailing list of Pharo is the right  
> place to report this problem in the case i'm i'm sorry.
>
> Here the trace from the log:
>
> Error: Invalid utf8 input detected
> 26 March 2010 4:14:07 pm
>
> VM: Mac OS - intel - 1062 - Squeak3.8.1 of '28 Aug 2006' [latest  
> update: #6747] Squeak VM 4.2.2b1
> Image: Pharo-1.0-10515-rc3 [Latest update: #10515]
>
> SecurityManager state:
> Restricted: false
> FileAccess: true
> SocketAccess: true
> Working Dir /Users/fabrizioperin/development/Pharo/WORKINGONNOW/
> MooseJEE_64
> Trusted Dir /foobar/tooBar/forSqueak/bogus
> Untrusted Dir /Users/fabrizioperin/Library/Preferences/Squeak/
> Internet/My Squeak
>
> UTF8TextConverter(Object)>>error:
>        Receiver: an UTF8TextConverter
>        Arguments and temporary variables:
>                aString:        'Invalid utf8 input detected'
>        Receiver's instance variables:
> an UTF8TextConverter
>
> UTF8TextConverter>>errorMalformedInput
>        Receiver: an UTF8TextConverter
>        Arguments and temporary variables:
>
>        Receiver's instance variables:
> an UTF8TextConverter
>
> UTF8TextConverter>>nextFromStream:
>        Receiver: an UTF8TextConverter
>        Arguments and temporary variables:
>                aStream:        MultiByteFileStream: '/Users/
> fabrizioperin/development/Pharo/WORKINGON...etc...
>                character1:     $¶
>                value1:         182
>                character2:     $s
>                value2:         115
>                unicode:        nil
>                character3:     $s
>                value3:         115
>                character4:     nil
>                value4:         nil
>        Receiver's instance variables:
> an UTF8TextConverter
>
> MultiByteFileStream>>next
>        Receiver: MultiByteFileStream: '/Users/fabrizioperin/
> development/Pharo/WORKINGONNOW/MooseJEE_64/src/...etc...
>        Arguments and temporary variables:
>                char:   nil
>                secondChar:     nil
>                state:  nil
>        Receiver's instance variables:
>
>
> XMLStreamReader>>basicNext
>        Receiver: a XMLStreamReader
>        Arguments and temporary variables:
>                nextChar:       nil
>        Receiver's instance variables:
>                stream:         MultiByteFileStream: '/Users/
> fabrizioperin/development/Pharo/WORKINGONN...etc...
>                nestedStreams:  nil
>                peekChar:       nil
>                buffer:         a WriteStream 'SES: Bean zum Einlesen  
> und updaten der Stako relevanten ...etc...
>
> XMLStreamReader>>next
>        Receiver: a XMLStreamReader
>        Arguments and temporary variables:
>                nextChar:       nil
>        Receiver's instance variables:
>                stream:         MultiByteFileStream: '/Users/
> fabrizioperin/development/Pharo/WORKINGONN...etc...
>                nestedStreams:  nil
>                peekChar:       nil
>                buffer:         a WriteStream 'SES: Bean zum Einlesen  
> und updaten der Stako relevanten ...etc...
>
> XMLStreamReader>>upToAll:
>        Receiver: a XMLStreamReader
>        Arguments and temporary variables:
>                aDelimitingString:      ']]>'
>        Receiver's instance variables:
>                stream:         MultiByteFileStream: '/Users/
> fabrizioperin/development/Pharo/WORKINGONN...etc...
>                nestedStreams:  nil
>                peekChar:       nil
>                buffer:         a WriteStream 'SES: Bean zum Einlesen  
> und updaten der Stako relevanten ...etc...
>
> SAXDriver(XMLTokenizer)>>nextCDataContent
>        Receiver: a SAXDriver
>        Arguments and temporary variables:
>                cdata:  nil
>        Receiver's instance variables:
>                streamReader:   a XMLStreamReader
>                streamWriter:   a XMLStreamWriter
>                entities:       nil
>                externalEntities:       nil
>                parameterEntities:      nil
>                isValidating:   false
>                parsingMarkup:  false
>                saxHandler:     an OPOpaxHandler
>                openTags:       <ejb-jar>, <enterprise-beans>,  
> <session>, <description>
>                nestedScopes:   nil
>                useNamespaces:  false
>                validateAttributes:     nil
>                languageEnvironment:    nil
>
> SAXDriver(XMLTokenizer)>>nextCDataOrConditional
>        Receiver: a SAXDriver
>        Arguments and temporary variables:
>                nextChar:       $C
>                conditionalKeyword:     nil
>        Receiver's instance variables:
>                streamReader:   a XMLStreamReader
>                streamWriter:   a XMLStreamWriter
>                entities:       nil
>                externalEntities:       nil
>                parameterEntities:      nil
>                isValidating:   false
>                parsingMarkup:  false
>                saxHandler:     an OPOpaxHandler
>                openTags:       <ejb-jar>, <enterprise-beans>,  
> <session>, <description>
>                nestedScopes:   nil
>                useNamespaces:  false
>                validateAttributes:     nil
>                languageEnvironment:    nil
>
> SAXDriver(XMLTokenizer)>>nextMarkupToken
>        Receiver: a SAXDriver
>        Arguments and temporary variables:
>                nextChar:       $[
>        Receiver's instance variables:
>                streamReader:   a XMLStreamReader
>                streamWriter:   a XMLStreamWriter
>                entities:       nil
>                externalEntities:       nil
>                parameterEntities:      nil
>                isValidating:   false
>                parsingMarkup:  false
>                saxHandler:     an OPOpaxHandler
>                openTags:       <ejb-jar>, <enterprise-beans>,  
> <session>, <description>
>                nestedScopes:   nil
>                useNamespaces:  false
>                validateAttributes:     nil
>                languageEnvironment:    nil
>
> SAXDriver(XMLTokenizer)>>nextToken
>        Receiver: a SAXDriver
>        Arguments and temporary variables:
>                whitespace:     ''
>        Receiver's instance variables:
>                streamReader:   a XMLStreamReader
>                streamWriter:   a XMLStreamWriter
>                entities:       nil
>                externalEntities:       nil
>                parameterEntities:      nil
>                isValidating:   false
>                parsingMarkup:  false
>                saxHandler:     an OPOpaxHandler
>                openTags:       <ejb-jar>, <enterprise-beans>,  
> <session>, <description>
>                nestedScopes:   nil
>                useNamespaces:  false
>                validateAttributes:     nil
>                languageEnvironment:    nil
>
> OPOpaxHandler(SAXHandler)>>parseDocument
>        Receiver: an OPOpaxHandler
>        Arguments and temporary variables:
>
>        Receiver's instance variables:
>                driver:         a SAXDriver
>                eod:    false
>                stack:  an OrderedCollection(<?xml version="1.0"  
> encoding="utf-8"?>
> <ejb-jar id=...etc...
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> --
> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
> Alexandre Bergel  http://www.bergel.eu
> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>
>
>
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Error parsing XML File

hernanmd
Hi Alex,
  Try with the attached XML file in your Pharo directory

XMLDOMParser
        parseDocumentFromFileNamed:
                (FileDirectory default fullNameFor: 'likelySubtags.xml')

however, it seems to be fixed in XML-Parser-JAAyer.72
Cheers,

Hernán

2010/3/27 Alexandre Bergel <[hidden email]>:

> Can you send me the file please?
>
> Alexandre
>
>
> On 27 Mar 2010, at 12:47, Fabrizio Perin wrote:
>
>> Hi Alex,
>> thanks for your effort. Actually the problem was related exactly to the
>> readStream, my method to import the XML files uses the class FileStream,
>> instead now it use StandardReadStream and everything works fine. All Tests
>> where green in my image too (including your new test) but the error still
>> raises trying to import from a file. So i investigate in the direction of
>> the readStream from a file and i found the solution. I'm still not sure
>> which is the problem using FileStream instead StandardFileStream.
>>
>> Thanks a lot,
>>
>> Fabrizio
>>
>> 2010/3/26 Alexandre Bergel <[hidden email]>
>> Hi Fabrizo,
>>
>> I think you're in the right place to talk about that.
>>
>> I haven't been able to reproduce your error.
>> I added a test:
>>
>> XMLParserTest>>testNonUTF8Characters
>>
>>       self shouldnt: [XMLDOMParser parseDocumentFrom:
>>               '<foo>Bean BLABLABLA Eidgenössisches Institut für
>> BLABLALBLA</foo>' readStream] raise: Error.
>>
>> It goes green in my image. Do you have a different way to get the
>> readStream from the String?
>>
>> Cheers,
>> Alexandre
>>
>>
>> On 26 Mar 2010, at 12:14, Fabrizio Perin wrote:
>>
>> Hi,
>> I was parsing an XML File with the last version of XML Parser
>> (XML-Parser-JAAyer.68) and i get an error related to a not UTF-8 character
>> that the parser found into the document. The XML document contains some
>> german character:
>>
>> <![CDATA[SES: Bean BLABLABLA Eidgenössisches Institut für BLABLALBLA]]>
>>
>> Actually i'm not sure if the error is which is in the UTF8TextConverter or
>> something is wrong in the invokation from the parser. Anyway i parse several
>> time the same document with older versions of the XML-Parser
>> (XML-Parser-JAAyer.57) and it always works well. I'm not sure if the mailing
>> list of Pharo is the right place to report this problem in the case i'm i'm
>> sorry.
>>
>> Here the trace from the log:
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [hidden email]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>> --
>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
>> Alexandre Bergel  http://www.bergel.eu
>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

likelySubtags.zip (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Error parsing XML File

jaayer
---- On Sun, 28 Mar 2010 11:47:43 -0700 Hernán Morales Durand <[hidden email]> wrote ----

>Hi Alex,
> Try with the attached XML file in your Pharo directory
>
>XMLDOMParser
>    parseDocumentFromFileNamed:
>        (FileDirectory default fullNameFor: 'likelySubtags.xml')
>
>however, it seems to be fixed in XML-Parser-JAAyer.72
>Cheers,

The problem was due to XMLStreamReader>>nextMatchAll: relying on #position: and #position and MultiByteFileStream understanding stream position in terms of bytes rather than characters. That's why it worked when Alexandre tried parsing a string containing multi-byte UTF-8 characters--the position of a ReadStream on a string corresponds to the position of characters in the string regardless of their width--but failed when Fabrizio tried to parse a file containing those same characters. It is fixed now and a little faster, too.


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Error parsing XML File

Stéphane Ducasse
tx!
On Mar 28, 2010, at 9:24 PM, jaayer wrote:

> ---- On Sun, 28 Mar 2010 11:47:43 -0700 Hernán Morales Durand <[hidden email]> wrote ----
>
>> Hi Alex,
>> Try with the attached XML file in your Pharo directory
>>
>> XMLDOMParser
>>     parseDocumentFromFileNamed:
>>         (FileDirectory default fullNameFor: 'likelySubtags.xml')
>>
>> however, it seems to be fixed in XML-Parser-JAAyer.72
>> Cheers,
>
> The problem was due to XMLStreamReader>>nextMatchAll: relying on #position: and #position and MultiByteFileStream understanding stream position in terms of bytes rather than characters. That's why it worked when Alexandre tried parsing a string containing multi-byte UTF-8 characters--the position of a ReadStream on a string corresponds to the position of characters in the string regardless of their width--but failed when Fabrizio tried to parse a file containing those same characters. It is fixed now and a little faster, too.
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Error parsing XML File

Fabrizio Perin-3
@Alex: sorry but i cannot send you the file (i shouldn't even have it).

Thanks a lot for the support and the explanation :)

Fabrizio

2010/3/28 Stéphane Ducasse <[hidden email]>
tx!
On Mar 28, 2010, at 9:24 PM, jaayer wrote:

> ---- On Sun, 28 Mar 2010 11:47:43 -0700 Hernán Morales Durand <[hidden email]> wrote ----
>
>> Hi Alex,
>> Try with the attached XML file in your Pharo directory
>>
>> XMLDOMParser
>>     parseDocumentFromFileNamed:
>>         (FileDirectory default fullNameFor: 'likelySubtags.xml')
>>
>> however, it seems to be fixed in XML-Parser-JAAyer.72
>> Cheers,
>
> The problem was due to XMLStreamReader>>nextMatchAll: relying on #position: and #position and MultiByteFileStream understanding stream position in terms of bytes rather than characters. That's why it worked when Alexandre tried parsing a string containing multi-byte UTF-8 characters--the position of a ReadStream on a string corresponds to the position of characters in the string regardless of their width--but failed when Fabrizio tried to parse a file containing those same characters. It is fixed now and a little faster, too.
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Reply | Threaded
Open this post in threaded view
|

Re: Error parsing XML File

Alexandre Bergel
In reply to this post by hernanmd
Hi Hernán,

Thanks for your file. I added as a test in XML-Parser.

Cheers,
Alexandre


On 28 Mar 2010, at 14:47, Hernán Morales Durand wrote:

> Hi Alex,
>  Try with the attached XML file in your Pharo directory
>
> XMLDOMParser
> parseDocumentFromFileNamed:
> (FileDirectory default fullNameFor: 'likelySubtags.xml')
>
> however, it seems to be fixed in XML-Parser-JAAyer.72
> Cheers,
>
> Hernán
>
> 2010/3/27 Alexandre Bergel <[hidden email]>:
>> Can you send me the file please?
>>
>> Alexandre
>>
>>
>> On 27 Mar 2010, at 12:47, Fabrizio Perin wrote:
>>
>>> Hi Alex,
>>> thanks for your effort. Actually the problem was related exactly  
>>> to the
>>> readStream, my method to import the XML files uses the class  
>>> FileStream,
>>> instead now it use StandardReadStream and everything works fine.  
>>> All Tests
>>> where green in my image too (including your new test) but the  
>>> error still
>>> raises trying to import from a file. So i investigate in the  
>>> direction of
>>> the readStream from a file and i found the solution. I'm still not  
>>> sure
>>> which is the problem using FileStream instead StandardFileStream.
>>>
>>> Thanks a lot,
>>>
>>> Fabrizio
>>>
>>> 2010/3/26 Alexandre Bergel <[hidden email]>
>>> Hi Fabrizo,
>>>
>>> I think you're in the right place to talk about that.
>>>
>>> I haven't been able to reproduce your error.
>>> I added a test:
>>>
>>> XMLParserTest>>testNonUTF8Characters
>>>
>>>       self shouldnt: [XMLDOMParser parseDocumentFrom:
>>>               '<foo>Bean BLABLABLA Eidgenössisches Institut für
>>> BLABLALBLA</foo>' readStream] raise: Error.
>>>
>>> It goes green in my image. Do you have a different way to get the
>>> readStream from the String?
>>>
>>> Cheers,
>>> Alexandre
>>>
>>>
>>> On 26 Mar 2010, at 12:14, Fabrizio Perin wrote:
>>>
>>> Hi,
>>> I was parsing an XML File with the last version of XML Parser
>>> (XML-Parser-JAAyer.68) and i get an error related to a not UTF-8  
>>> character
>>> that the parser found into the document. The XML document contains  
>>> some
>>> german character:
>>>
>>> <![CDATA[SES: Bean BLABLABLA Eidgenössisches Institut für  
>>> BLABLALBLA]]>
>>>
>>> Actually i'm not sure if the error is which is in the  
>>> UTF8TextConverter or
>>> something is wrong in the invokation from the parser. Anyway i  
>>> parse several
>>> time the same document with older versions of the XML-Parser
>>> (XML-Parser-JAAyer.57) and it always works well. I'm not sure if  
>>> the mailing
>>> list of Pharo is the right place to report this problem in the  
>>> case i'm i'm
>>> sorry.
>>>
>>> Here the trace from the log:
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [hidden email]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>> --
>>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
>>> Alexandre Bergel  http://www.bergel.eu
>>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
> <likelySubtags.zip>_______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






_______________________________________________
Pharo-project mailing list
[hidden email]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project