MongoTalk issue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

MongoTalk issue

EstebanLM
Hi,

Ok... this is the problem: I have a customer who likes to add stupid unicode to his strings (like strange open-close colons, not the regulars).
And of course, the BSON driver does not handle them well... in fact, it persist them well, but when customer tries to read it, it throws and error (Invalid type).
So... I started to investigate and I figured out that problem is that when the answered string is a widestring, size readed and size expected can be different (depending on the amount of 2+ bytes characters on the unicode string).

After some tries, I come this incredibly ugly hack that works:

BSON>>nextSizedString
        | size result |
        size := stream nextUInt32.
        result  := stream nextString.
        result isWideString ifTrue: [
                stream skip: (size - (result
                        collect: [ :each | each asString asByteArray size ]
                        as: OrderedCollection) sum) - 2 ].
       
        ^result

LittleEndianStream>>skip: aNumber
        stream skip: aNumber

as you can see... It "calculates" the real consumed bytes (which is strangely less than declared size) and skips to its real position minus 2.

This works, at least in the examples I have at my hand...

now... my questions:

- I'm sure it has to be a better way to calculate the difference, but I didn't find a good one.
- why that "- 2"?????? WTF... what does that means????
- can someone confirm that the fix works? (I have "production issues", I need something that works as fast as I can)

Thanks,
Esteban



Reply | Threaded
Open this post in threaded view
|

Re: MongoTalk issue

Igor Stasenko
On 1 July 2012 15:10, Esteban Lorenzano <[hidden email]> wrote:

> Hi,
>
> Ok... this is the problem: I have a customer who likes to add stupid unicode to his strings (like strange open-close colons, not the regulars).
> And of course, the BSON driver does not handle them well... in fact, it persist them well, but when customer tries to read it, it throws and error (Invalid type).
> So... I started to investigate and I figured out that problem is that when the answered string is a widestring, size readed and size expected can be different (depending on the amount of 2+ bytes characters on the unicode string).
>
> After some tries, I come this incredibly ugly hack that works:
>
> BSON>>nextSizedString
>         | size result |
>         size := stream nextUInt32.
>         result  := stream nextString.
>         result isWideString ifTrue: [
>                 stream skip: (size - (result
>                         collect: [ :each | each asString asByteArray size ]
>                         as: OrderedCollection) sum) - 2 ].
>
>         ^result
>
> LittleEndianStream>>skip: aNumber
>         stream skip: aNumber
>
> as you can see... It "calculates" the real consumed bytes (which is strangely less than declared size) and skips to its real position minus 2.
>
> This works, at least in the examples I have at my hand...
>
> now... my questions:
>
> - I'm sure it has to be a better way to calculate the difference, but I didn't find a good one.
> - why that "- 2"?????? WTF... what does that means????
> - can someone confirm that the fix works? (I have "production issues", I need something that works as fast as I can)
>

maybe its BOM (byte order mark) character?
what unicode encoding used on server? utf-8 i guess?

> Thanks,
> Esteban
>
>
>



--
Best regards,
Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: MongoTalk issue

Nicolas Cellier
In reply to this post by EstebanLM
http://ss3.gemstone.com/ss/MongoSt.html suggest that some WideString
problems were solved in this fork...

Nicolas

2012/7/1 Esteban Lorenzano <[hidden email]>:

> Hi,
>
> Ok... this is the problem: I have a customer who likes to add stupid unicode to his strings (like strange open-close colons, not the regulars).
> And of course, the BSON driver does not handle them well... in fact, it persist them well, but when customer tries to read it, it throws and error (Invalid type).
> So... I started to investigate and I figured out that problem is that when the answered string is a widestring, size readed and size expected can be different (depending on the amount of 2+ bytes characters on the unicode string).
>
> After some tries, I come this incredibly ugly hack that works:
>
> BSON>>nextSizedString
>         | size result |
>         size := stream nextUInt32.
>         result  := stream nextString.
>         result isWideString ifTrue: [
>                 stream skip: (size - (result
>                         collect: [ :each | each asString asByteArray size ]
>                         as: OrderedCollection) sum) - 2 ].
>
>         ^result
>
> LittleEndianStream>>skip: aNumber
>         stream skip: aNumber
>
> as you can see... It "calculates" the real consumed bytes (which is strangely less than declared size) and skips to its real position minus 2.
>
> This works, at least in the examples I have at my hand...
>
> now... my questions:
>
> - I'm sure it has to be a better way to calculate the difference, but I didn't find a good one.
> - why that "- 2"?????? WTF... what does that means????
> - can someone confirm that the fix works? (I have "production issues", I need something that works as fast as I can)
>
> Thanks,
> Esteban
>
>
>