Hi,
I'm trying to make something work, but I'm not sure how to do it, or even whether it's all possible. I know what I /want/ it to do, but I don't know enough about external memory (etc) to know how close I can get to my aim. I'll have to start with a bit of background. I'm stalled on several projects for lack of /real/ Unicode handling in Dolphin, so I decided to take a detour and put something together. (The UnicodeString class is no use whatsoever for my purposes, indeed I think that "UnicodeString" is a misnomer -- it should be called something like WideString since that's what it is (even incomplete as it currently is)). What I want is the ability to handle Unicode data in all of the defined encodings (at least: UTF-8, UTF-16, UTF-32, and the truly, mind-bendingly, weird encoding that Sun have defined for communicating with the Java VM). Naturally I want the resulting objects to be as String-like as possible (I need {Read/Write}Streams too, but I know how to handle that). So what I've currently got is a bunch of classes corresponding to each of the encodings (UTF8String, UTF16String, etc). Each object consists of a ByteArray (or similar), plus a few housekeeping fields (sizeInBytes, sizeInCharacters, ...). The encoding used to map between logical Unicode characters and the actual bytes in the binary data is determined by the class of the object. All that works (as far as I've got, anyway). One small relevant complexity is that I want to be able to support implicitly null-terminated strings (following the pattern of Dolphin Strings) but don't want to /force/ that, so I have to keep a record of how many bytes are part of the explicit string, as distinct from the size of the ByteArray itself. That may help with other stuff too... Anyway, that's the background. Now what I'm looking at is how best to use these things for external interfacing. I can see how to pass the things out to external code that expects a byte buffer in <whatever> encoding, but how do I handle the reverse ? Say that an external function returns a pointer to a null-terminated "string" in UTF-8 format. The easy, but unsatisfactory, way to call it would be to declare the method (to Dolphin) as returning void* or similar, and then leave it up to the custom wrapping code to create a UTF8String which either wrapped the corresponding ExternalAddress (or would it be an LPVOID -- I've never understood the difference ?) or which wrapped a ByteArray created by copying the external bytes. What I really want is to be able to handle such strings more transparently, much as the Dolphin VM does for 8-bit "native" strings. I'd like to be able to declare the external method as something like: <stdcall: UTF8String someFunction ...args...> and have the VM automatically create an instance of UTF8String which wraps the external address. I don't know if that's possible (I don't want to make my strings inherit from ExternalStructure) -- it may even be that it would "just work" with my code more-or-less as it is at present (the UTF8Sting's "bytes" instvar is at index 1, which I suspect may be necessary). Alternatively, I would be happy if the VM could be persuaded to copy the external bytes into a (null-terminated) ByteArray and wrap a UTF8String around that (sort of like how I think it handles Stings). I suspect that would require special VM magic, though. (I'm assuming null-terminated "strings" because if the API is defined to take, say, a void** and a size_t* as parameters that it fills in with a pointer to the buffer and its size, then there's no way that the VM can handle that automatically for me -- I think...). Ideally, I would like my string objects to be able to wrap either ByteArrays or external addresses, but I could live with a split, so that UTF8String (and all the others) existed as both UTF8String (a subclass of AbstractUnicodeString, under SequenceableCollection, with a complete Collections+Unicode implementation, but which could /only/ wrap ByteArrays) plus a cut-down ExternalUTF8String (a subclass of ExternalStructure, with a limited protocol, but which can wrap ExternalAddress/LPVOIDs). The actual encoding/decoding logic is split out into separate objects anyway, so having internal and external flavours of each kind of string wouldn't involve too much code duplication. I hope all that made some kind of sense. Anyone got any ideas, suggestions, or more information on how the automatic creation of object-wrappers works ? Thanks for reading. -- chris |
"Chris Uppal" <[hidden email]> wrote in message
news:[hidden email]... > .... > What I want is the ability to handle Unicode data in all of the defined > encodings (at least: UTF-8, UTF-16, UTF-32, and the truly, mind-bendingly, > weird encoding that Sun have defined for communicating with the Java VM). > Naturally I want the resulting objects to be as String-like as possible > ... >... One small relevant complexity is that I > want to be able to support implicitly null-terminated strings (following > the > pattern of Dolphin Strings) but don't want to /force/ that, so I have to > keep a > record of how many bytes are part of the explicit string, as distinct from > the > size of the ByteArray itself. That may help with other stuff too... > Why bother allowing both cases? Dolphin itself doesn't rely on strings being null-terminated, but it always null-terminates its own strings at creation time by allocating an extra character which is implicitly initiated to zero as a result of the normal memory initialization performed by the object memory. The extra space is not included in the size reported for the object. This behaviour is controlled by a behaviour bit that specifies whether instances are null-terminated or not - its only relevant to byte objects though. Why not take advantage of that for your UTF strings? > Anyway, that's the background. Now what I'm looking at is how best to use > these things for external interfacing. Well first off I would be careful about conflating the internal and external. I suggest you consider providing a separate class to represent externally created objects should you need them. If we were building this into the base system we would first build UTF strings for internal manipulation, so those would go in the collection hierarchy and probably involve some refactoring of String itself. External interface usage would be a secondary consideration, and most likely require some additional classes in the ExternalStructure hierarchy. >...I can see how to pass the things out to > external code that expects a byte buffer in <whatever> encoding, but how > do I > handle the reverse ? Say that an external function returns a pointer to > a > null-terminated "string" in UTF-8 format. The easy, but unsatisfactory, > way to > call it would be to declare the method (to Dolphin) as returning void* or > similar, and then leave it up to the custom wrapping code to create a > UTF8String which either wrapped the corresponding ExternalAddress (or > would it > be an LPVOID -- I've never understood the difference ?) or which wrapped a > ByteArray created by copying the external bytes. Firstly lets cover LPVOID vs ExternalAddress - it may help. The purpose of LPVOID is to represent situations where you need a reference to an address (i.e. a double-indirection). It lives in the ExternalStructure hierarchy, which is able to represent both a "value" instance and a "reference" instance. Value instances hold an internally allocated ByteArray. Reference instances hold an ExternalAddress instance which points at the data, which has usually been allocated from some external heap. ExternalAddress has a special behaviour bit set to indicate to the VM that it is an "indirection" object, so that it is implicitly indirected in certain primitives - .Value instances of LPVOID are not useful - it is always used with reference instances to represent a pointer to a pointer. Actually LPVOID is needed in very few cases - typically only in callback situations or sometimes when doubly-indirected pointers are embedded in arrays or structures. The majority of the time ExternalAddress instances are used. > > What I really want is to be able to handle such strings more > transparently, > much as the Dolphin VM does for 8-bit "native" strings. I'd like to be > able to > declare the external method as something like: > <stdcall: UTF8String someFunction ...args...> > and have the VM automatically create an instance of UTF8String which wraps > the > external address. I don't know if that's possible (I don't want to make > my > strings inherit from ExternalStructure) -- it may even be that it would > "just > work" with my code more-or-less as it is at present (the UTF8Sting's > "bytes" > instvar is at index 1, which I suspect may be necessary). It is possible, and it probably will just work. The classes do not have to be ExternalStructures, but they have to be shaped like them. The VM has fairly flexible capabilities for creating return values, and for creating objects passed to callbacks (which amounts to the same thing). You can return a "structure" by value by declaring it as in your example - of course the VM has to know how large the object is. It gets this information by accessing the byte size information held in some extra behaviour bits. If you browse the ExternalStructure hierarchy you will be able to find where this gets set. The VM will then create the declared object type, and an instance of ByteArray of the byte size stored in the class. It stores this byte array in the first instance variable of the structure object. It also copies the data, either from the stack, or from registers, depending on the size of the structure and the calling convention, into the ByteArray. The VM can also create byte objects directly to represent structure values, which it will do if the structure class in the declaration is a byte class and again has the byte size encoded in the behaviour bits. GUID is an example of such a class in the image. >...Alternatively, I > would be happy if the VM could be persuaded to copy the external bytes > into a > (null-terminated) ByteArray and wrap a UTF8String around that (sort of > like how > I think it handles Stings). I suspect that would require special VM > magic, > though. > > (I'm assuming null-terminated "strings" because if the API is defined to > take, > say, a void** and a size_t* as parameters that it fills in with a pointer > to > the buffer and its size, then there's no way that the VM can handle that > automatically for me -- I think...). Correct. To marshal such cases automatically at the VM level there would need to be more information in the declarations - i.e. it would need to be more like IDL where direction is specified and also the relationship between the size parameter and the buffer. > > Ideally, I would like my string objects to be able to wrap either > ByteArrays or > external addresses, but I could live with a split, so that UTF8String (and > all > the others) existed as both UTF8String (a subclass of > AbstractUnicodeString, > under SequenceableCollection, with a complete Collections+Unicode > implementation, but which could /only/ wrap ByteArrays) plus a cut-down > ExternalUTF8String (a subclass of ExternalStructure, with a limited > protocol, > but which can wrap ExternalAddress/LPVOIDs). The actual encoding/decoding > logic is split out into separate objects anyway, so having internal and > external flavours of each kind of string wouldn't involve too much code > duplication. > As I say, I would recommend separate classes along the lines of the design you suggest, although for efficiency reasons I think you can and should avoid the indirection to a ByteArray by using byte classes directly. Of course if that makes the implementation more complex or confusing it can be left for a later exercise. Generally speaking you need to explicitly marshal externally allocated data anyway at some point, so it can get confusing if you try to do it all in one class. > I hope all that made some kind of sense. Anyone got any ideas, > suggestions, or > more information on how the automatic creation of object-wrappers works ? > > Thanks for reading. > Hope this helps Regards Blair |
Blair,
> Hope this helps It does indeed. Many thanks for the explanations and suggestions; I can -- I think -- see where I'm going with this now. -- chris |
Free forum by Nabble | Edit this page |