Hi,
I was parsing an XML File with the last version of XML Parser (XML-Parser-JAAyer.68) and i get an error related to a not UTF-8 character that the parser found into the document. The XML document contains some german character:
<![CDATA[SES: Bean BLABLABLA Eidgenössisches Institut für BLABLALBLA]]> Actually i'm not sure if the error is which is in the UTF8TextConverter or something is wrong in the invokation from the parser. Anyway i parse several time the same document with older versions of the XML-Parser (XML-Parser-JAAyer.57) and it always works well. I'm not sure if the mailing list of Pharo is the right place to report this problem in the case i'm i'm sorry.
Here the trace from the log: Error: Invalid utf8 input detected 26 March 2010 4:14:07 pm VM: Mac OS - intel - 1062 - Squeak3.8.1 of '28 Aug 2006' [latest update: #6747] Squeak VM 4.2.2b1
Image: Pharo-1.0-10515-rc3 [Latest update: #10515] SecurityManager state: Restricted: false FileAccess: true SocketAccess: true Working Dir /Users/fabrizioperin/development/Pharo/WORKINGONNOW/MooseJEE_64
Trusted Dir /foobar/tooBar/forSqueak/bogus Untrusted Dir /Users/fabrizioperin/Library/Preferences/Squeak/Internet/My Squeak UTF8TextConverter(Object)>>error: Receiver: an UTF8TextConverter
Arguments and temporary variables: aString: 'Invalid utf8 input detected'
Receiver's instance variables: an UTF8TextConverter UTF8TextConverter>>errorMalformedInput Receiver: an UTF8TextConverter
Arguments and temporary variables: Receiver's instance variables:
an UTF8TextConverter UTF8TextConverter>>nextFromStream: Receiver: an UTF8TextConverter Arguments and temporary variables:
aStream: MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGON...etc...
character1: $¶ value1: 182
character2: $s value2: 115
unicode: nil character3: $s
value3: 115 character4: nil
value4: nil Receiver's instance variables:
an UTF8TextConverter MultiByteFileStream>>next Receiver: MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONNOW/MooseJEE_64/src/...etc...
Arguments and temporary variables: char: nil
secondChar: nil state: nil
Receiver's instance variables: XMLStreamReader>>basicNext Receiver: a XMLStreamReader
Arguments and temporary variables: nextChar: nil
Receiver's instance variables: stream: MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONN...etc...
nestedStreams: nil peekChar: nil
buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der Stako relevanten ...etc...
XMLStreamReader>>next Receiver: a XMLStreamReader Arguments and temporary variables:
nextChar: nil Receiver's instance variables:
stream: MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONN...etc...
nestedStreams: nil peekChar: nil
buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der Stako relevanten ...etc...
XMLStreamReader>>upToAll: Receiver: a XMLStreamReader Arguments and temporary variables:
aDelimitingString: ']]>' Receiver's instance variables:
stream: MultiByteFileStream: '/Users/fabrizioperin/development/Pharo/WORKINGONN...etc...
nestedStreams: nil peekChar: nil
buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der Stako relevanten ...etc...
SAXDriver(XMLTokenizer)>>nextCDataContent Receiver: a SAXDriver Arguments and temporary variables:
cdata: nil Receiver's instance variables:
streamReader: a XMLStreamReader streamWriter: a XMLStreamWriter
entities: nil externalEntities: nil
parameterEntities: nil isValidating: false
parsingMarkup: false saxHandler: an OPOpaxHandler
openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
nestedScopes: nil useNamespaces: false
validateAttributes: nil languageEnvironment: nil
SAXDriver(XMLTokenizer)>>nextCDataOrConditional Receiver: a SAXDriver Arguments and temporary variables:
nextChar: $C conditionalKeyword: nil
Receiver's instance variables: streamReader: a XMLStreamReader
streamWriter: a XMLStreamWriter entities: nil
externalEntities: nil parameterEntities: nil
isValidating: false parsingMarkup: false
saxHandler: an OPOpaxHandler openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
nestedScopes: nil useNamespaces: false
validateAttributes: nil languageEnvironment: nil
SAXDriver(XMLTokenizer)>>nextMarkupToken Receiver: a SAXDriver Arguments and temporary variables:
nextChar: $[ Receiver's instance variables:
streamReader: a XMLStreamReader streamWriter: a XMLStreamWriter
entities: nil externalEntities: nil
parameterEntities: nil isValidating: false
parsingMarkup: false saxHandler: an OPOpaxHandler
openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
nestedScopes: nil useNamespaces: false
validateAttributes: nil languageEnvironment: nil
SAXDriver(XMLTokenizer)>>nextToken Receiver: a SAXDriver Arguments and temporary variables:
whitespace: '' Receiver's instance variables:
streamReader: a XMLStreamReader streamWriter: a XMLStreamWriter
entities: nil externalEntities: nil
parameterEntities: nil isValidating: false
parsingMarkup: false saxHandler: an OPOpaxHandler
openTags: <ejb-jar>, <enterprise-beans>, <session>, <description>
nestedScopes: nil useNamespaces: false
validateAttributes: nil languageEnvironment: nil
OPOpaxHandler(SAXHandler)>>parseDocument Receiver: an OPOpaxHandler Arguments and temporary variables:
Receiver's instance variables: driver: a SAXDriver
eod: false stack: an OrderedCollection(<?xml version="1.0" encoding="utf-8"?>
<ejb-jar id=...etc... _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Hi Fabrizo,
I think you're in the right place to talk about that. I haven't been able to reproduce your error. I added a test: XMLParserTest>>testNonUTF8Characters self shouldnt: [XMLDOMParser parseDocumentFrom: '<foo>Bean BLABLABLA Eidgenössisches Institut für BLABLALBLA</foo>' readStream] raise: Error. It goes green in my image. Do you have a different way to get the readStream from the String? Cheers, Alexandre On 26 Mar 2010, at 12:14, Fabrizio Perin wrote: > Hi, > I was parsing an XML File with the last version of XML Parser (XML- > Parser-JAAyer.68) and i get an error related to a not UTF-8 > character that the parser found into the document. The XML document > contains some german character: > > <![CDATA[SES: Bean BLABLABLA Eidgenössisches Institut für > BLABLALBLA]]> > > Actually i'm not sure if the error is which is in the > UTF8TextConverter or something is wrong in the invokation from the > parser. Anyway i parse several time the same document with older > versions of the XML-Parser (XML-Parser-JAAyer.57) and it always > works well. I'm not sure if the mailing list of Pharo is the right > place to report this problem in the case i'm i'm sorry. > > Here the trace from the log: > > Error: Invalid utf8 input detected > 26 March 2010 4:14:07 pm > > VM: Mac OS - intel - 1062 - Squeak3.8.1 of '28 Aug 2006' [latest > update: #6747] Squeak VM 4.2.2b1 > Image: Pharo-1.0-10515-rc3 [Latest update: #10515] > > SecurityManager state: > Restricted: false > FileAccess: true > SocketAccess: true > Working Dir /Users/fabrizioperin/development/Pharo/WORKINGONNOW/ > MooseJEE_64 > Trusted Dir /foobar/tooBar/forSqueak/bogus > Untrusted Dir /Users/fabrizioperin/Library/Preferences/Squeak/ > Internet/My Squeak > > UTF8TextConverter(Object)>>error: > Receiver: an UTF8TextConverter > Arguments and temporary variables: > aString: 'Invalid utf8 input detected' > Receiver's instance variables: > an UTF8TextConverter > > UTF8TextConverter>>errorMalformedInput > Receiver: an UTF8TextConverter > Arguments and temporary variables: > > Receiver's instance variables: > an UTF8TextConverter > > UTF8TextConverter>>nextFromStream: > Receiver: an UTF8TextConverter > Arguments and temporary variables: > aStream: MultiByteFileStream: '/Users/fabrizioperin/development/ > Pharo/WORKINGON...etc... > character1: $¶ > value1: 182 > character2: $s > value2: 115 > unicode: nil > character3: $s > value3: 115 > character4: nil > value4: nil > Receiver's instance variables: > an UTF8TextConverter > > MultiByteFileStream>>next > Receiver: MultiByteFileStream: '/Users/fabrizioperin/development/ > Pharo/WORKINGONNOW/MooseJEE_64/src/...etc... > Arguments and temporary variables: > char: nil > secondChar: nil > state: nil > Receiver's instance variables: > > > XMLStreamReader>>basicNext > Receiver: a XMLStreamReader > Arguments and temporary variables: > nextChar: nil > Receiver's instance variables: > stream: MultiByteFileStream: '/Users/fabrizioperin/development/ > Pharo/WORKINGONN...etc... > nestedStreams: nil > peekChar: nil > buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der > Stako relevanten ...etc... > > XMLStreamReader>>next > Receiver: a XMLStreamReader > Arguments and temporary variables: > nextChar: nil > Receiver's instance variables: > stream: MultiByteFileStream: '/Users/fabrizioperin/development/ > Pharo/WORKINGONN...etc... > nestedStreams: nil > peekChar: nil > buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der > Stako relevanten ...etc... > > XMLStreamReader>>upToAll: > Receiver: a XMLStreamReader > Arguments and temporary variables: > aDelimitingString: ']]>' > Receiver's instance variables: > stream: MultiByteFileStream: '/Users/fabrizioperin/development/ > Pharo/WORKINGONN...etc... > nestedStreams: nil > peekChar: nil > buffer: a WriteStream 'SES: Bean zum Einlesen und updaten der > Stako relevanten ...etc... > > SAXDriver(XMLTokenizer)>>nextCDataContent > Receiver: a SAXDriver > Arguments and temporary variables: > cdata: nil > Receiver's instance variables: > streamReader: a XMLStreamReader > streamWriter: a XMLStreamWriter > entities: nil > externalEntities: nil > parameterEntities: nil > isValidating: false > parsingMarkup: false > saxHandler: an OPOpaxHandler > openTags: <ejb-jar>, <enterprise-beans>, <session>, <description> > nestedScopes: nil > useNamespaces: false > validateAttributes: nil > languageEnvironment: nil > > SAXDriver(XMLTokenizer)>>nextCDataOrConditional > Receiver: a SAXDriver > Arguments and temporary variables: > nextChar: $C > conditionalKeyword: nil > Receiver's instance variables: > streamReader: a XMLStreamReader > streamWriter: a XMLStreamWriter > entities: nil > externalEntities: nil > parameterEntities: nil > isValidating: false > parsingMarkup: false > saxHandler: an OPOpaxHandler > openTags: <ejb-jar>, <enterprise-beans>, <session>, <description> > nestedScopes: nil > useNamespaces: false > validateAttributes: nil > languageEnvironment: nil > > SAXDriver(XMLTokenizer)>>nextMarkupToken > Receiver: a SAXDriver > Arguments and temporary variables: > nextChar: $[ > Receiver's instance variables: > streamReader: a XMLStreamReader > streamWriter: a XMLStreamWriter > entities: nil > externalEntities: nil > parameterEntities: nil > isValidating: false > parsingMarkup: false > saxHandler: an OPOpaxHandler > openTags: <ejb-jar>, <enterprise-beans>, <session>, <description> > nestedScopes: nil > useNamespaces: false > validateAttributes: nil > languageEnvironment: nil > > SAXDriver(XMLTokenizer)>>nextToken > Receiver: a SAXDriver > Arguments and temporary variables: > whitespace: '' > Receiver's instance variables: > streamReader: a XMLStreamReader > streamWriter: a XMLStreamWriter > entities: nil > externalEntities: nil > parameterEntities: nil > isValidating: false > parsingMarkup: false > saxHandler: an OPOpaxHandler > openTags: <ejb-jar>, <enterprise-beans>, <session>, <description> > nestedScopes: nil > useNamespaces: false > validateAttributes: nil > languageEnvironment: nil > > OPOpaxHandler(SAXHandler)>>parseDocument > Receiver: an OPOpaxHandler > Arguments and temporary variables: > > Receiver's instance variables: > driver: a SAXDriver > eod: false > stack: an OrderedCollection(<?xml version="1.0" encoding="utf-8"?> > <ejb-jar id=...etc... > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Hi Alex,
thanks for your effort. Actually the problem was related exactly to the readStream, my method to import the XML files uses the class FileStream, instead now it use StandardReadStream and everything works fine. All Tests where green in my image too (including your new test) but the error still raises trying to import from a file. So i investigate in the direction of the readStream from a file and i found the solution. I'm still not sure which is the problem using FileStream instead StandardFileStream.
Thanks a lot, Fabrizio
2010/3/26 Alexandre Bergel <[hidden email]> Hi Fabrizo, _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Can you send me the file please?
Alexandre On 27 Mar 2010, at 12:47, Fabrizio Perin wrote: > Hi Alex, > thanks for your effort. Actually the problem was related exactly to > the readStream, my method to import the XML files uses the class > FileStream, instead now it use StandardReadStream and everything > works fine. All Tests where green in my image too (including your > new test) but the error still raises trying to import from a file. > So i investigate in the direction of the readStream from a file and > i found the solution. I'm still not sure which is the problem using > FileStream instead StandardFileStream. > > Thanks a lot, > > Fabrizio > > 2010/3/26 Alexandre Bergel <[hidden email]> > Hi Fabrizo, > > I think you're in the right place to talk about that. > > I haven't been able to reproduce your error. > I added a test: > > XMLParserTest>>testNonUTF8Characters > > self shouldnt: [XMLDOMParser parseDocumentFrom: > '<foo>Bean BLABLABLA Eidgenössisches Institut für > BLABLALBLA</foo>' readStream] raise: Error. > > It goes green in my image. Do you have a different way to get the > readStream from the String? > > Cheers, > Alexandre > > > On 26 Mar 2010, at 12:14, Fabrizio Perin wrote: > > Hi, > I was parsing an XML File with the last version of XML Parser (XML- > Parser-JAAyer.68) and i get an error related to a not UTF-8 > character that the parser found into the document. The XML document > contains some german character: > > <![CDATA[SES: Bean BLABLABLA Eidgenössisches Institut für > BLABLALBLA]]> > > Actually i'm not sure if the error is which is in the > UTF8TextConverter or something is wrong in the invokation from the > parser. Anyway i parse several time the same document with older > versions of the XML-Parser (XML-Parser-JAAyer.57) and it always > works well. I'm not sure if the mailing list of Pharo is the right > place to report this problem in the case i'm i'm sorry. > > Here the trace from the log: > > Error: Invalid utf8 input detected > 26 March 2010 4:14:07 pm > > VM: Mac OS - intel - 1062 - Squeak3.8.1 of '28 Aug 2006' [latest > update: #6747] Squeak VM 4.2.2b1 > Image: Pharo-1.0-10515-rc3 [Latest update: #10515] > > SecurityManager state: > Restricted: false > FileAccess: true > SocketAccess: true > Working Dir /Users/fabrizioperin/development/Pharo/WORKINGONNOW/ > MooseJEE_64 > Trusted Dir /foobar/tooBar/forSqueak/bogus > Untrusted Dir /Users/fabrizioperin/Library/Preferences/Squeak/ > Internet/My Squeak > > UTF8TextConverter(Object)>>error: > Receiver: an UTF8TextConverter > Arguments and temporary variables: > aString: 'Invalid utf8 input detected' > Receiver's instance variables: > an UTF8TextConverter > > UTF8TextConverter>>errorMalformedInput > Receiver: an UTF8TextConverter > Arguments and temporary variables: > > Receiver's instance variables: > an UTF8TextConverter > > UTF8TextConverter>>nextFromStream: > Receiver: an UTF8TextConverter > Arguments and temporary variables: > aStream: MultiByteFileStream: '/Users/ > fabrizioperin/development/Pharo/WORKINGON...etc... > character1: $¶ > value1: 182 > character2: $s > value2: 115 > unicode: nil > character3: $s > value3: 115 > character4: nil > value4: nil > Receiver's instance variables: > an UTF8TextConverter > > MultiByteFileStream>>next > Receiver: MultiByteFileStream: '/Users/fabrizioperin/ > development/Pharo/WORKINGONNOW/MooseJEE_64/src/...etc... > Arguments and temporary variables: > char: nil > secondChar: nil > state: nil > Receiver's instance variables: > > > XMLStreamReader>>basicNext > Receiver: a XMLStreamReader > Arguments and temporary variables: > nextChar: nil > Receiver's instance variables: > stream: MultiByteFileStream: '/Users/ > fabrizioperin/development/Pharo/WORKINGONN...etc... > nestedStreams: nil > peekChar: nil > buffer: a WriteStream 'SES: Bean zum Einlesen > und updaten der Stako relevanten ...etc... > > XMLStreamReader>>next > Receiver: a XMLStreamReader > Arguments and temporary variables: > nextChar: nil > Receiver's instance variables: > stream: MultiByteFileStream: '/Users/ > fabrizioperin/development/Pharo/WORKINGONN...etc... > nestedStreams: nil > peekChar: nil > buffer: a WriteStream 'SES: Bean zum Einlesen > und updaten der Stako relevanten ...etc... > > XMLStreamReader>>upToAll: > Receiver: a XMLStreamReader > Arguments and temporary variables: > aDelimitingString: ']]>' > Receiver's instance variables: > stream: MultiByteFileStream: '/Users/ > fabrizioperin/development/Pharo/WORKINGONN...etc... > nestedStreams: nil > peekChar: nil > buffer: a WriteStream 'SES: Bean zum Einlesen > und updaten der Stako relevanten ...etc... > > SAXDriver(XMLTokenizer)>>nextCDataContent > Receiver: a SAXDriver > Arguments and temporary variables: > cdata: nil > Receiver's instance variables: > streamReader: a XMLStreamReader > streamWriter: a XMLStreamWriter > entities: nil > externalEntities: nil > parameterEntities: nil > isValidating: false > parsingMarkup: false > saxHandler: an OPOpaxHandler > openTags: <ejb-jar>, <enterprise-beans>, > <session>, <description> > nestedScopes: nil > useNamespaces: false > validateAttributes: nil > languageEnvironment: nil > > SAXDriver(XMLTokenizer)>>nextCDataOrConditional > Receiver: a SAXDriver > Arguments and temporary variables: > nextChar: $C > conditionalKeyword: nil > Receiver's instance variables: > streamReader: a XMLStreamReader > streamWriter: a XMLStreamWriter > entities: nil > externalEntities: nil > parameterEntities: nil > isValidating: false > parsingMarkup: false > saxHandler: an OPOpaxHandler > openTags: <ejb-jar>, <enterprise-beans>, > <session>, <description> > nestedScopes: nil > useNamespaces: false > validateAttributes: nil > languageEnvironment: nil > > SAXDriver(XMLTokenizer)>>nextMarkupToken > Receiver: a SAXDriver > Arguments and temporary variables: > nextChar: $[ > Receiver's instance variables: > streamReader: a XMLStreamReader > streamWriter: a XMLStreamWriter > entities: nil > externalEntities: nil > parameterEntities: nil > isValidating: false > parsingMarkup: false > saxHandler: an OPOpaxHandler > openTags: <ejb-jar>, <enterprise-beans>, > <session>, <description> > nestedScopes: nil > useNamespaces: false > validateAttributes: nil > languageEnvironment: nil > > SAXDriver(XMLTokenizer)>>nextToken > Receiver: a SAXDriver > Arguments and temporary variables: > whitespace: '' > Receiver's instance variables: > streamReader: a XMLStreamReader > streamWriter: a XMLStreamWriter > entities: nil > externalEntities: nil > parameterEntities: nil > isValidating: false > parsingMarkup: false > saxHandler: an OPOpaxHandler > openTags: <ejb-jar>, <enterprise-beans>, > <session>, <description> > nestedScopes: nil > useNamespaces: false > validateAttributes: nil > languageEnvironment: nil > > OPOpaxHandler(SAXHandler)>>parseDocument > Receiver: an OPOpaxHandler > Arguments and temporary variables: > > Receiver's instance variables: > driver: a SAXDriver > eod: false > stack: an OrderedCollection(<?xml version="1.0" > encoding="utf-8"?> > <ejb-jar id=...etc... > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > -- > _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: > Alexandre Bergel http://www.bergel.eu > ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. > > > > > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Hi Alex,
Try with the attached XML file in your Pharo directory XMLDOMParser parseDocumentFromFileNamed: (FileDirectory default fullNameFor: 'likelySubtags.xml') however, it seems to be fixed in XML-Parser-JAAyer.72 Cheers, Hernán 2010/3/27 Alexandre Bergel <[hidden email]>: > Can you send me the file please? > > Alexandre > > > On 27 Mar 2010, at 12:47, Fabrizio Perin wrote: > >> Hi Alex, >> thanks for your effort. Actually the problem was related exactly to the >> readStream, my method to import the XML files uses the class FileStream, >> instead now it use StandardReadStream and everything works fine. All Tests >> where green in my image too (including your new test) but the error still >> raises trying to import from a file. So i investigate in the direction of >> the readStream from a file and i found the solution. I'm still not sure >> which is the problem using FileStream instead StandardFileStream. >> >> Thanks a lot, >> >> Fabrizio >> >> 2010/3/26 Alexandre Bergel <[hidden email]> >> Hi Fabrizo, >> >> I think you're in the right place to talk about that. >> >> I haven't been able to reproduce your error. >> I added a test: >> >> XMLParserTest>>testNonUTF8Characters >> >> self shouldnt: [XMLDOMParser parseDocumentFrom: >> '<foo>Bean BLABLABLA Eidgenössisches Institut für >> BLABLALBLA</foo>' readStream] raise: Error. >> >> It goes green in my image. Do you have a different way to get the >> readStream from the String? >> >> Cheers, >> Alexandre >> >> >> On 26 Mar 2010, at 12:14, Fabrizio Perin wrote: >> >> Hi, >> I was parsing an XML File with the last version of XML Parser >> (XML-Parser-JAAyer.68) and i get an error related to a not UTF-8 character >> that the parser found into the document. The XML document contains some >> german character: >> >> <![CDATA[SES: Bean BLABLABLA Eidgenössisches Institut für BLABLALBLA]]> >> >> Actually i'm not sure if the error is which is in the UTF8TextConverter or >> something is wrong in the invokation from the parser. Anyway i parse several >> time the same document with older versions of the XML-Parser >> (XML-Parser-JAAyer.57) and it always works well. I'm not sure if the mailing >> list of Pharo is the right place to report this problem in the case i'm i'm >> sorry. >> >> Here the trace from the log: >> >> _______________________________________________ >> Pharo-project mailing list >> [hidden email] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >> >> -- >> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: >> Alexandre Bergel http://www.bergel.eu >> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project likelySubtags.zip (11K) Download Attachment |
---- On Sun, 28 Mar 2010 11:47:43 -0700 Hernán Morales Durand <[hidden email]> wrote ----
>Hi Alex, > Try with the attached XML file in your Pharo directory > >XMLDOMParser > parseDocumentFromFileNamed: > (FileDirectory default fullNameFor: 'likelySubtags.xml') > >however, it seems to be fixed in XML-Parser-JAAyer.72 >Cheers, The problem was due to XMLStreamReader>>nextMatchAll: relying on #position: and #position and MultiByteFileStream understanding stream position in terms of bytes rather than characters. That's why it worked when Alexandre tried parsing a string containing multi-byte UTF-8 characters--the position of a ReadStream on a string corresponds to the position of characters in the string regardless of their width--but failed when Fabrizio tried to parse a file containing those same characters. It is fixed now and a little faster, too. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
tx!
On Mar 28, 2010, at 9:24 PM, jaayer wrote: > ---- On Sun, 28 Mar 2010 11:47:43 -0700 Hernán Morales Durand <[hidden email]> wrote ---- > >> Hi Alex, >> Try with the attached XML file in your Pharo directory >> >> XMLDOMParser >> parseDocumentFromFileNamed: >> (FileDirectory default fullNameFor: 'likelySubtags.xml') >> >> however, it seems to be fixed in XML-Parser-JAAyer.72 >> Cheers, > > The problem was due to XMLStreamReader>>nextMatchAll: relying on #position: and #position and MultiByteFileStream understanding stream position in terms of bytes rather than characters. That's why it worked when Alexandre tried parsing a string containing multi-byte UTF-8 characters--the position of a ReadStream on a string corresponds to the position of characters in the string regardless of their width--but failed when Fabrizio tried to parse a file containing those same characters. It is fixed now and a little faster, too. > > > _______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
@Alex: sorry but i cannot send you the file (i shouldn't even have it).
Thanks a lot for the support and the explanation :) Fabrizio 2010/3/28 Stéphane Ducasse <[hidden email]> tx! _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
In reply to this post by hernanmd
Hi Hernán,
Thanks for your file. I added as a test in XML-Parser. Cheers, Alexandre On 28 Mar 2010, at 14:47, Hernán Morales Durand wrote: > Hi Alex, > Try with the attached XML file in your Pharo directory > > XMLDOMParser > parseDocumentFromFileNamed: > (FileDirectory default fullNameFor: 'likelySubtags.xml') > > however, it seems to be fixed in XML-Parser-JAAyer.72 > Cheers, > > Hernán > > 2010/3/27 Alexandre Bergel <[hidden email]>: >> Can you send me the file please? >> >> Alexandre >> >> >> On 27 Mar 2010, at 12:47, Fabrizio Perin wrote: >> >>> Hi Alex, >>> thanks for your effort. Actually the problem was related exactly >>> to the >>> readStream, my method to import the XML files uses the class >>> FileStream, >>> instead now it use StandardReadStream and everything works fine. >>> All Tests >>> where green in my image too (including your new test) but the >>> error still >>> raises trying to import from a file. So i investigate in the >>> direction of >>> the readStream from a file and i found the solution. I'm still not >>> sure >>> which is the problem using FileStream instead StandardFileStream. >>> >>> Thanks a lot, >>> >>> Fabrizio >>> >>> 2010/3/26 Alexandre Bergel <[hidden email]> >>> Hi Fabrizo, >>> >>> I think you're in the right place to talk about that. >>> >>> I haven't been able to reproduce your error. >>> I added a test: >>> >>> XMLParserTest>>testNonUTF8Characters >>> >>> self shouldnt: [XMLDOMParser parseDocumentFrom: >>> '<foo>Bean BLABLABLA Eidgenössisches Institut für >>> BLABLALBLA</foo>' readStream] raise: Error. >>> >>> It goes green in my image. Do you have a different way to get the >>> readStream from the String? >>> >>> Cheers, >>> Alexandre >>> >>> >>> On 26 Mar 2010, at 12:14, Fabrizio Perin wrote: >>> >>> Hi, >>> I was parsing an XML File with the last version of XML Parser >>> (XML-Parser-JAAyer.68) and i get an error related to a not UTF-8 >>> character >>> that the parser found into the document. The XML document contains >>> some >>> german character: >>> >>> <![CDATA[SES: Bean BLABLABLA Eidgenössisches Institut für >>> BLABLALBLA]]> >>> >>> Actually i'm not sure if the error is which is in the >>> UTF8TextConverter or >>> something is wrong in the invokation from the parser. Anyway i >>> parse several >>> time the same document with older versions of the XML-Parser >>> (XML-Parser-JAAyer.57) and it always works well. I'm not sure if >>> the mailing >>> list of Pharo is the right place to report this problem in the >>> case i'm i'm >>> sorry. >>> >>> Here the trace from the log: >>> >>> _______________________________________________ >>> Pharo-project mailing list >>> [hidden email] >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>> >>> -- >>> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: >>> Alexandre Bergel http://www.bergel.eu >>> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. > <likelySubtags.zip>_______________________________________________ > Pharo-project mailing list > [hidden email] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. _______________________________________________ Pharo-project mailing list [hidden email] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project |
Free forum by Nabble | Edit this page |