Status: Accepted
Owner: [hidden email] Labels: Type-Defect Priority-Medium GLASS-Server Version-1.0-beta.8.7 New issue 306 by [hidden email]: handling WideStrings instances in mcz files http://code.google.com/p/glassdb/issues/detail?id=306 occasionally an mcz file gets committed to a repository with an embedded WideString instance ... this of course causes havoc if GemStone tries to open the mcz file .. Henrik Johansen has a set of patches for GemStone that convert WideString instances into QuadByteStrings (and presumably vice versa?). For SS3 to server all sorts of mcz files, this type of WideString support is necessary (http://code.google.com/p/squeaksource3/issues/detail?id=17). I've attached the Henrik's mcz files Attachments: Core-HenrikSperreJohansen.35.mcz 117 KB Monticello-HenrikSperreJohansen.412.mcz 156 KB GSWideTest-HenrikSperreJohansen.5.mcz 1.4 KB GSWideTest-HenrikSperreJohansen.4.mcz 1.3 KB GSWideTest-HenrikSperreJohansen.1.mcz 1.1 KB |
Comment #1 on issue 306 by norbert.hartl: handling WideStrings instances in mcz files http://code.google.com/p/glassdb/issues/detail?id=306 I think this is not a "fix". It covers a bug. Monticello packages are used in different platforms and therefor need to be platform independent. So, writing a monticello package in a pharo specific way is a problem. To make gemstone be able to read those packages just prevents someone from fixing the broken character encoding in monticello. So this helps worsen the situation. As far as I know all platforms are capable to convert strings internally if they find to need a multi byte representation. Just like SmallInteger does. Seeing it this way monticello should store only String/Symbol objects with an encoding of utf-8 by default. Every platform will while decoding the utf-8 bytes see that there is a multi byte representation needed and will convert it internally in the platform specific form. |
Comment #2 on issue 306 by [hidden email]: handling WideStrings instances in mcz files http://code.google.com/p/glassdb/issues/detail?id=306 Ensuring this in near to impossible. This is due to the limitations of the underlying DataStream file-out mechanism. Eventually, Monticello’s snapshot.bin is just a serialization of the MCDefinitions. Hence, if MCDefinitions contain WideStrings somewhere in them (wich may be the case, when some source contains wide strings) they are just written out as is. There is a special handling of strings in DataStream that was last changed when ByteStings vs. WideStrings were introduced in Squeak around 2005. > While ByteStrings are stored by content and marked as strings, WideStrings are just stored as binary content and marked as of class “WideString” But my point is, we are not to change this. There are MCZ-files out there that were generated that way and we cannot change it. IMHO, we shall • support the WideSting named Strings in the way that we convert them on the fly while loading. (this is necessary as we cannot fall back to reading the .st-source, as it may be in UTF-32 once _any_ source contains a wide character, and there is no possibility to know that upfront.) • Move away from DataStream soon. But this is another, major discussion. BTW: what about the conversion of WideString to QuadByteString I proposed earlier? |
Comment #3 on issue 306 by [hidden email]: handling WideStrings instances in mcz files http://code.google.com/p/glassdb/issues/detail?id=306 Here a test file, that Definitely has WideStrings. Attachments: *Temp-topa.1.mcz 1.1 KB |
Comment #4 on issue 306 by norbert.hartl: handling WideStrings instances in mcz files http://code.google.com/p/glassdb/issues/detail?id=306 I don't kow. The last time I looked at monticello I saw some wracked setup with Latin1Textconverter which isn't used in case of a multibyte string. So that's the main reason that it is just written the way it is. We had this problem a few times in seaside. The problem ocurrs if a single non-8bit character is in source code. But the proper fix is to open that monticello package in pharo and remove the bogus characters. Why remove? Because at this stage monticello does just not support anything but ascii. Adding defensive behavior of trying to understand things that are broken makes it just worse. What will happen is that e.g. in the seaside case we don't have the chance to find it out at all. Well, until maybe someone in another dialect that didn't "fix" monticello complains about an unloadable package. Maybe you should just determine those packages in squeaksource3 and mark them unreadable for gemstone. Then someone might have a look into it. |
Comment #5 on issue 306 by [hidden email]: handling WideStrings instances in mcz files http://code.google.com/p/glassdb/issues/detail?id=306 @Dale do you mind attaching my version of converting WideStrings to QuadByteStrings? It is restricted to DataStream and won't touch much more of the system. |
Comment #6 on issue 306 by [hidden email]: handling WideStrings instances in mcz files http://code.google.com/p/glassdb/issues/detail?id=306 @Norbert ... I agree that it is a bug to include WideStrings in mcz packages ... it is a failure of the basic mechanisms ... The right answer is to use UTF8 encoding for all of the method source in the mcz ... the problem is getting everyone to convert since there is no standard Monticello source ... with all of that said, there are a number of mcz files out there that work "perfectly fine" and SS3 as a source repository should not refuse to accept such mcz files in the repository ... Sooo, I am inclined to make the support of WideStrings conditional so that SS3 (and SmalltalkHub) can serve all mcz files. The standard GemStone release will continue to puke on WideStrings, but perhaps provide a useful error message ... |
Comment #7 on issue 306 by [hidden email]: handling WideStrings instances in mcz files http://code.google.com/p/glassdb/issues/detail?id=306 @Tobias, if you've got a proposed solution, go ahead and attach the files to this bug report .. Henrik had supplied me with his fix at ESUG and I didn't want to lose track of the files ... |
Free forum by Nabble | Edit this page |