Hi,
Ok... this is the problem: I have a customer who likes to add stupid unicode to his strings (like strange open-close colons, not the regulars). And of course, the BSON driver does not handle them well... in fact, it persist them well, but when customer tries to read it, it throws and error (Invalid type). So... I started to investigate and I figured out that problem is that when the answered string is a widestring, size readed and size expected can be different (depending on the amount of 2+ bytes characters on the unicode string). After some tries, I come this incredibly ugly hack that works: BSON>>nextSizedString | size result | size := stream nextUInt32. result := stream nextString. result isWideString ifTrue: [ stream skip: (size - (result collect: [ :each | each asString asByteArray size ] as: OrderedCollection) sum) - 2 ]. ^result LittleEndianStream>>skip: aNumber stream skip: aNumber as you can see... It "calculates" the real consumed bytes (which is strangely less than declared size) and skips to its real position minus 2. This works, at least in the examples I have at my hand... now... my questions: - I'm sure it has to be a better way to calculate the difference, but I didn't find a good one. - why that "- 2"?????? WTF... what does that means???? - can someone confirm that the fix works? (I have "production issues", I need something that works as fast as I can) Thanks, Esteban |
On 1 July 2012 15:10, Esteban Lorenzano <[hidden email]> wrote:
> Hi, > > Ok... this is the problem: I have a customer who likes to add stupid unicode to his strings (like strange open-close colons, not the regulars). > And of course, the BSON driver does not handle them well... in fact, it persist them well, but when customer tries to read it, it throws and error (Invalid type). > So... I started to investigate and I figured out that problem is that when the answered string is a widestring, size readed and size expected can be different (depending on the amount of 2+ bytes characters on the unicode string). > > After some tries, I come this incredibly ugly hack that works: > > BSON>>nextSizedString > | size result | > size := stream nextUInt32. > result := stream nextString. > result isWideString ifTrue: [ > stream skip: (size - (result > collect: [ :each | each asString asByteArray size ] > as: OrderedCollection) sum) - 2 ]. > > ^result > > LittleEndianStream>>skip: aNumber > stream skip: aNumber > > as you can see... It "calculates" the real consumed bytes (which is strangely less than declared size) and skips to its real position minus 2. > > This works, at least in the examples I have at my hand... > > now... my questions: > > - I'm sure it has to be a better way to calculate the difference, but I didn't find a good one. > - why that "- 2"?????? WTF... what does that means???? > - can someone confirm that the fix works? (I have "production issues", I need something that works as fast as I can) > maybe its BOM (byte order mark) character? what unicode encoding used on server? utf-8 i guess? > Thanks, > Esteban > > > -- Best regards, Igor Stasenko. |
In reply to this post by EstebanLM
http://ss3.gemstone.com/ss/MongoSt.html suggest that some WideString
problems were solved in this fork... Nicolas 2012/7/1 Esteban Lorenzano <[hidden email]>: > Hi, > > Ok... this is the problem: I have a customer who likes to add stupid unicode to his strings (like strange open-close colons, not the regulars). > And of course, the BSON driver does not handle them well... in fact, it persist them well, but when customer tries to read it, it throws and error (Invalid type). > So... I started to investigate and I figured out that problem is that when the answered string is a widestring, size readed and size expected can be different (depending on the amount of 2+ bytes characters on the unicode string). > > After some tries, I come this incredibly ugly hack that works: > > BSON>>nextSizedString > | size result | > size := stream nextUInt32. > result := stream nextString. > result isWideString ifTrue: [ > stream skip: (size - (result > collect: [ :each | each asString asByteArray size ] > as: OrderedCollection) sum) - 2 ]. > > ^result > > LittleEndianStream>>skip: aNumber > stream skip: aNumber > > as you can see... It "calculates" the real consumed bytes (which is strangely less than declared size) and skips to its real position minus 2. > > This works, at least in the examples I have at my hand... > > now... my questions: > > - I'm sure it has to be a better way to calculate the difference, but I didn't find a good one. > - why that "- 2"?????? WTF... what does that means???? > - can someone confirm that the fix works? (I have "production issues", I need something that works as fast as I can) > > Thanks, > Esteban > > > |
Free forum by Nabble | Edit this page |