Issue 306 in glassdb: handling WideStrings instances in mcz files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue 306 in glassdb: handling WideStrings instances in mcz files

glassdb
Status: Accepted
Owner: [hidden email]
Labels: Type-Defect Priority-Medium GLASS-Server Version-1.0-beta.8.7

New issue 306 by [hidden email]: handling WideStrings instances in  
mcz files
http://code.google.com/p/glassdb/issues/detail?id=306

occasionally an mcz file gets committed to a repository with an embedded  
WideString instance ... this of course causes havoc if GemStone tries to  
open the mcz file .. Henrik Johansen has a set of patches for GemStone that  
convert WideString instances into QuadByteStrings (and presumably vice  
versa?).

For SS3 to server all sorts of mcz files, this type of WideString support  
is necessary (http://code.google.com/p/squeaksource3/issues/detail?id=17).

I've attached the Henrik's mcz files

Attachments:
        Core-HenrikSperreJohansen.35.mcz  117 KB
        Monticello-HenrikSperreJohansen.412.mcz  156 KB
        GSWideTest-HenrikSperreJohansen.5.mcz  1.4 KB
        GSWideTest-HenrikSperreJohansen.4.mcz  1.3 KB
        GSWideTest-HenrikSperreJohansen.1.mcz  1.1 KB

Reply | Threaded
Open this post in threaded view
|

Re: Issue 306 in glassdb: handling WideStrings instances in mcz files

glassdb

Comment #1 on issue 306 by norbert.hartl: handling WideStrings instances in  
mcz files
http://code.google.com/p/glassdb/issues/detail?id=306

I think this is not a "fix". It covers a bug. Monticello packages are used  
in different platforms and therefor need to be platform independent. So,  
writing a monticello package in a pharo specific way is a problem. To make  
gemstone be able to read those packages just prevents someone from fixing  
the broken character encoding in monticello. So this helps worsen the  
situation.
As far as I know all platforms are capable to convert strings internally if  
they find to need a multi byte representation. Just like SmallInteger does.  
Seeing it this way monticello should store only String/Symbol objects with  
an encoding of utf-8 by default. Every platform will while decoding the  
utf-8 bytes see that there is a multi byte representation needed and will  
convert it internally in the platform specific form.

Reply | Threaded
Open this post in threaded view
|

Re: Issue 306 in glassdb: handling WideStrings instances in mcz files

glassdb

Comment #2 on issue 306 by [hidden email]: handling WideStrings  
instances in mcz files
http://code.google.com/p/glassdb/issues/detail?id=306

Ensuring this in near to impossible.
This is due to the limitations of the underlying DataStream file-out  
mechanism.
Eventually, Monticello’s snapshot.bin is just a serialization of the  
MCDefinitions.
Hence, if MCDefinitions contain WideStrings somewhere in them (wich may be  
the case,
when some source contains wide strings) they are just written out as is.

There is a special handling of strings in DataStream that was last changed  
when ByteStings vs. WideStrings
were introduced in Squeak around 2005.
> While ByteStrings are stored by content and marked as
strings, WideStrings are just stored as binary content and marked as of  
class “WideString”

But my point is, we are not to change this. There are MCZ-files out there
that were generated that way and we cannot change it.

IMHO, we shall
• support the WideSting named Strings in the way that we convert them on  
the fly while loading.
    (this is necessary as we cannot fall back to reading the .st-source, as  
it may be in UTF-32 once
     _any_ source contains a wide character, and there is no possibility to  
know that upfront.)
• Move away from DataStream soon. But this is another, major discussion.


BTW: what about the conversion of WideString to QuadByteString I proposed  
earlier?

Reply | Threaded
Open this post in threaded view
|

Re: Issue 306 in glassdb: handling WideStrings instances in mcz files

glassdb

Comment #3 on issue 306 by [hidden email]: handling WideStrings  
instances in mcz files
http://code.google.com/p/glassdb/issues/detail?id=306

Here a test file, that Definitely has WideStrings.

Attachments:
        *Temp-topa.1.mcz  1.1 KB

Reply | Threaded
Open this post in threaded view
|

Re: Issue 306 in glassdb: handling WideStrings instances in mcz files

glassdb

Comment #4 on issue 306 by norbert.hartl: handling WideStrings instances in  
mcz files
http://code.google.com/p/glassdb/issues/detail?id=306

I don't kow. The last time I looked at monticello I saw some wracked setup  
with Latin1Textconverter which isn't used in case of a multibyte string. So  
that's the main reason that it is just written the way it is.
We had this problem a few times in seaside. The problem ocurrs if a single  
non-8bit character is in source code.  But the proper fix is to open that  
monticello package in pharo and remove the bogus characters. Why remove?  
Because at this stage monticello does just not support anything but ascii.  
Adding defensive behavior of trying to understand things that are broken  
makes it just worse. What will happen is that e.g. in the seaside case we  
don't have the chance to find it out at all. Well, until maybe someone in  
another dialect that didn't "fix" monticello complains about an unloadable  
package.
Maybe you should just determine those packages in squeaksource3 and mark  
them unreadable for gemstone. Then someone might have a look into it.

Reply | Threaded
Open this post in threaded view
|

Re: Issue 306 in glassdb: handling WideStrings instances in mcz files

glassdb

Comment #5 on issue 306 by [hidden email]: handling WideStrings  
instances in mcz files
http://code.google.com/p/glassdb/issues/detail?id=306

@Dale do you mind attaching my version of converting WideStrings to  
QuadByteStrings?
It is restricted to DataStream and won't touch much more of the system.

Reply | Threaded
Open this post in threaded view
|

Re: Issue 306 in glassdb: handling WideStrings instances in mcz files

glassdb

Comment #6 on issue 306 by [hidden email]: handling WideStrings  
instances in mcz files
http://code.google.com/p/glassdb/issues/detail?id=306

@Norbert ... I agree that it is a bug to include WideStrings in mcz  
packages ... it is a failure of the basic mechanisms ... The right answer  
is to use UTF8 encoding for all of the method source in the mcz ... the  
problem is getting everyone to convert since there is no standard  
Monticello source ... with all of that said, there are a number of mcz  
files out there that work "perfectly fine" and SS3 as a source repository  
should not refuse to accept such mcz files in the repository ...

Sooo, I am inclined to make the support of WideStrings conditional so that  
SS3 (and SmalltalkHub) can serve all mcz files. The standard GemStone  
release will continue to puke on WideStrings, but perhaps provide a useful  
error message ...

Reply | Threaded
Open this post in threaded view
|

Re: Issue 306 in glassdb: handling WideStrings instances in mcz files

glassdb

Comment #7 on issue 306 by [hidden email]: handling WideStrings  
instances in mcz files
http://code.google.com/p/glassdb/issues/detail?id=306

@Tobias, if you've got a proposed solution, go ahead and attach the files  
to this bug report .. Henrik had supplied me with his fix at ESUG and I  
didn't want to lose track of the files ...