I think streams has become my long term nightmare now :)
The last days I was able to convert my 2.3.1 gemstone installation to a 2.4.4.1 one. With this change I started to convert some of my projects from seaside 2.8 to 3.0. The first thing I like to convert is a pier based project. So I need to move the kernel from the old installation to the new one. The only chance is to export the whole pier kernel and reimport it in the new installation. The export worked quite well and I have a kernel.xml that contains the whole pier kernel in UTF-8 encoding. For loading a pier kernel I have an older code snippet provided by Dale. The problem this snippets solves is that XML files can be pretty big while temporary object space is not. It does that by reading the file in chunks and commits the chunks until the whole file is read. The important piece looks like this [ str := gsFile next: 100000. str ~~ nil] whileTrue: [ Transcript show: '|'. self persistentString add: str] ... This piece of code does not deal with character encoding. First I changed self persistentRoot add: str to self persistentRoot add: str decodeFromUTF8 But then I realized that reading chunks and decode afterwards can't really work. A unicode character might by two bytes and the chunk probably ends in between the two bytes. The best solution is probably to read encoded. Today I started to solve the problem. As a good approach I thought I could use Grease as compat layer. GRUtf8CodecStream is the one I need. Well, that is true but what to use on the gemstone side? After a while I realized that if I want to be able to read files from the server there is no other way than to use GsFile. But GsFile itself is not a stream and therefor cannot be put into another stream. The best ones using GsFile is SpFileStream. But it only does "GsFile openRead: ..." and I need GsFile openReadOnServer: ... I added SpFileStream>>_readingFromGsFile: aGsFile filename := aGsFile pathName. underlyingStream := aGsFile to be able to do |file| file := GsFile open: '/tmp/testfile.txt' mode: 'rb' onClient: false. (SpFileStream new _readingFromGsFile: file) contents but this didn't work either. The open:mode:onClient allows to specify a mode 'rb' which is meant to be read-only/binary mode. But a |file| file := GsFile open: '/tmp/testfile' mode: 'rb' onClient: false. (SpFileStream new _readingFromGsFile: file) next class gives Character back. This fails latest in GRUtf8CodecStream as GRUtf8CodecStream>>next: anInteger ... [ count < anInteger and: [ stream atEnd not ] ] whileTrue: [ byte1 := stream next. unicode := byte1. (byte1 bitAnd: 16rE0) ... bitAnd: is not available in class Character. I can "fix" this if I exchange every ocurrence of "stream next" in GRUtf8CodecStream>>next: by "stream next asciiValue" but I'm not sure if this is a good change. To me it boils down to the question who is responsible for "interpreting" something as "binary". The GsFile lets me specify the mode but it is useless. If this mode is just used to the an fopen call then this might be a problem. While researching I've found in the manpage of fopen: "The mode string can also include the letter ''b'' either as a last character or as a character between the characters in any of the two-character strings described above. This is strictly for compatibility with C89 and has no effect; the ''b'' is ignored on all POSIX conforming systems, including Linux. " So if GRUtf8CodecStream>>next: assumes to deal with binary data who is responsible to make it so. Is there a need for a binary data delivering stream or should GRUtf8CodecStream just convert the stream results to binary as I did with my change. What do you think? Or did I overlook something completely and there is an easy solution to this? thanks, Norbert |
I think it makes sense to agree that binary is a meaningless designation.
A character-oriented stream should return unicode characters and application developers shouldn't have to process the individual bytes. I suspect we need a low-level fix from GemStone here. Different operating systems will have different primitives for dealing with this. I could imagine writing a user action for OS X that would work correctly, but I know nothing about unix/posix implementations of unicode compliant I/O. Steve On Mon, Aug 2, 2010 at 6:34 AM, Norbert Hartl <[hidden email]> wrote: > I think streams has become my long term nightmare now :) > The last days I was able to convert my 2.3.1 gemstone installation to a > 2.4.4.1 one. With this change I started to convert some of my projects from > seaside 2.8 to 3.0. The first thing I like to convert is a pier based > project. So I need to move the kernel from the old installation to the new > one. The only chance is to export the whole pier kernel and reimport it in > the new installation. > The export worked quite well and I have a kernel.xml that contains the whole > pier kernel in UTF-8 encoding. For loading a pier kernel I have an older > code snippet provided by Dale. The problem this snippets solves is that XML > files can be pretty big while temporary object space is not. It does that by > reading the file in chunks and commits the chunks until the whole file is > read. The important piece looks like this > [ str := gsFile next: 100000. > str ~~ nil] whileTrue: [ > Transcript show: '|'. > self persistentString add: str] ... > This piece of code does not deal with character encoding. First I changed > self persistentRoot add: str > to > self persistentRoot add: str decodeFromUTF8 > But then I realized that reading chunks and decode afterwards can't really > work. A unicode character might by two bytes and the chunk probably ends in > between the two bytes. The best solution is probably to read encoded. > Today I started to solve the problem. As a good approach I thought I could > use Grease as compat layer. GRUtf8CodecStream is the one I need. Well, that > is true but what to use on the gemstone side? After a while I realized that > if I want to be able to read files from the server there is no other way > than to use GsFile. But GsFile itself is not a stream and therefor cannot be > put into another stream. The best ones using GsFile is SpFileStream. But it > only does "GsFile openRead: ..." and I need GsFile openReadOnServer: ... > I added > SpFileStream>>_readingFromGsFile: aGsFile > filename := aGsFile pathName. > underlyingStream := aGsFile > to be able to do > |file| > file := GsFile open: '/tmp/testfile.txt' mode: 'rb' onClient: false. > (SpFileStream new _readingFromGsFile: file) contents > but this didn't work either. The open:mode:onClient allows to specify a mode > 'rb' which is meant to be read-only/binary mode. But a > |file| > file := GsFile open: '/tmp/testfile' mode: 'rb' onClient: false. > (SpFileStream new _readingFromGsFile: file) next class > gives Character back. This fails latest in GRUtf8CodecStream as > GRUtf8CodecStream>>next: anInteger > ... > [ count < anInteger and: [ stream atEnd not ] ] whileTrue: [ > byte1 := stream next. > unicode := byte1. > (byte1 bitAnd: 16rE0) > ... > bitAnd: is not available in class Character. > I can "fix" this if I exchange every ocurrence of "stream next" in > GRUtf8CodecStream>>next: by "stream next asciiValue" but I'm not sure if > this is a good change. > To me it boils down to the question who is responsible for "interpreting" > something as "binary". The GsFile lets me specify the mode but it is > useless. If this mode is just used to the an fopen call then this might be a > problem. While researching I've found in the manpage of fopen: > "The mode string can also include the letter ''b'' either as a last > character or as a character between the characters in any of the > two-character strings described above. This is strictly for compatibility > with C89 and has no effect; the ''b'' is ignored on all POSIX conforming > systems, including Linux. " > So if GRUtf8CodecStream>>next: assumes to deal with binary data who is > responsible to make it so. Is there a need for a binary data delivering > stream or should GRUtf8CodecStream just convert the stream results to binary > as I did with my change. > What do you think? Or did I overlook something completely and there is an > easy solution to this? > thanks, > Norbert > |
> I can "fix" this if I exchange every ocurrence of "stream next" in GRUtf8CodecStream>>next: by "stream next asciiValue" but I'm not sure if this is a good change.
Looking at Character>>asInteger it appears the above should work, but I'm sure there are other pitfalls, so I would do the simplest thing possible. We also need to process multi-byte Unicode strings so I'm hopeful that we don't have a big job to make this work. Does GemStone support indexing and collation on Unicode strings? Steve On Mon, Aug 2, 2010 at 9:29 AM, Steve Wart <[hidden email]> wrote: > I think it makes sense to agree that binary is a meaningless designation. > > A character-oriented stream should return unicode characters and > application developers shouldn't have to process the individual bytes. > > I suspect we need a low-level fix from GemStone here. Different > operating systems will have different primitives for dealing with > this. > > I could imagine writing a user action for OS X that would work > correctly, but I know nothing about unix/posix implementations of > unicode compliant I/O. > > Steve > > On Mon, Aug 2, 2010 at 6:34 AM, Norbert Hartl <[hidden email]> wrote: >> I think streams has become my long term nightmare now :) >> The last days I was able to convert my 2.3.1 gemstone installation to a >> 2.4.4.1 one. With this change I started to convert some of my projects from >> seaside 2.8 to 3.0. The first thing I like to convert is a pier based >> project. So I need to move the kernel from the old installation to the new >> one. The only chance is to export the whole pier kernel and reimport it in >> the new installation. >> The export worked quite well and I have a kernel.xml that contains the whole >> pier kernel in UTF-8 encoding. For loading a pier kernel I have an older >> code snippet provided by Dale. The problem this snippets solves is that XML >> files can be pretty big while temporary object space is not. It does that by >> reading the file in chunks and commits the chunks until the whole file is >> read. The important piece looks like this >> [ str := gsFile next: 100000. >> str ~~ nil] whileTrue: [ >> Transcript show: '|'. >> self persistentString add: str] ... >> This piece of code does not deal with character encoding. First I changed >> self persistentRoot add: str >> to >> self persistentRoot add: str decodeFromUTF8 >> But then I realized that reading chunks and decode afterwards can't really >> work. A unicode character might by two bytes and the chunk probably ends in >> between the two bytes. The best solution is probably to read encoded. >> Today I started to solve the problem. As a good approach I thought I could >> use Grease as compat layer. GRUtf8CodecStream is the one I need. Well, that >> is true but what to use on the gemstone side? After a while I realized that >> if I want to be able to read files from the server there is no other way >> than to use GsFile. But GsFile itself is not a stream and therefor cannot be >> put into another stream. The best ones using GsFile is SpFileStream. But it >> only does "GsFile openRead: ..." and I need GsFile openReadOnServer: ... >> I added >> SpFileStream>>_readingFromGsFile: aGsFile >> filename := aGsFile pathName. >> underlyingStream := aGsFile >> to be able to do >> |file| >> file := GsFile open: '/tmp/testfile.txt' mode: 'rb' onClient: false. >> (SpFileStream new _readingFromGsFile: file) contents >> but this didn't work either. The open:mode:onClient allows to specify a mode >> 'rb' which is meant to be read-only/binary mode. But a >> |file| >> file := GsFile open: '/tmp/testfile' mode: 'rb' onClient: false. >> (SpFileStream new _readingFromGsFile: file) next class >> gives Character back. This fails latest in GRUtf8CodecStream as >> GRUtf8CodecStream>>next: anInteger >> ... >> [ count < anInteger and: [ stream atEnd not ] ] whileTrue: [ >> byte1 := stream next. >> unicode := byte1. >> (byte1 bitAnd: 16rE0) >> ... >> bitAnd: is not available in class Character. >> I can "fix" this if I exchange every ocurrence of "stream next" in >> GRUtf8CodecStream>>next: by "stream next asciiValue" but I'm not sure if >> this is a good change. >> To me it boils down to the question who is responsible for "interpreting" >> something as "binary". The GsFile lets me specify the mode but it is >> useless. If this mode is just used to the an fopen call then this might be a >> problem. While researching I've found in the manpage of fopen: >> "The mode string can also include the letter ''b'' either as a last >> character or as a character between the characters in any of the >> two-character strings described above. This is strictly for compatibility >> with C89 and has no effect; the ''b'' is ignored on all POSIX conforming >> systems, including Linux. " >> So if GRUtf8CodecStream>>next: assumes to deal with binary data who is >> responsible to make it so. Is there a need for a binary data delivering >> stream or should GRUtf8CodecStream just convert the stream results to binary >> as I did with my change. >> What do you think? Or did I overlook something completely and there is an >> easy solution to this? >> thanks, >> Norbert >> > |
In reply to this post by NorbertHartl
Norbert Hartl wrote:
> I think streams has become my long term nightmare now :) > > The last days I was able to convert my 2.3.1 gemstone installation to a > 2.4.4.1 one. With this change I started to convert some of my projects > from seaside 2.8 to 3.0. The first thing I like to convert is a pier > based project. So I need to move the kernel from the old installation to > the new one. The only chance is to export the whole pier kernel and > reimport it in the new installation. > > The export worked quite well and I have a kernel.xml that contains the > whole pier kernel in UTF-8 encoding. For loading a pier kernel I have an > older code snippet provided by Dale. The problem this snippets solves is > that XML files can be pretty big while temporary object space is not. It > does that by reading the file in chunks and commits the chunks until the > whole file is read. The important piece looks like this > > [ str := gsFile next: 100000. > str ~~ nil] whileTrue: [ > Transcript show: '|'. > self persistentString add: str] ... > > This piece of code does not deal with character encoding. First I changed > > self persistentRoot add: str > > to > > self persistentRoot add: str decodeFromUTF8 > > But then I realized that reading chunks and decode afterwards can't > really work. A unicode character might by two bytes and the chunk > probably ends in between the two bytes. The best solution is probably to > read encoded. > > Today I started to solve the problem. As a good approach I thought I > could use Grease as compat layer. GRUtf8CodecStream is the one I need. > Well, that is true but what to use on the gemstone side? After a while I > realized that if I want to be able to read files from the server there > is no other way than to use GsFile. But GsFile itself is not a stream > and therefor cannot be put into another stream. The best ones using > GsFile is SpFileStream. But it only does "GsFile openRead: ..." and I > need GsFile openReadOnServer: ... > I added > > SpFileStream>>_readingFromGsFile: aGsFile > filename := aGsFile pathName. > underlyingStream := aGsFile > > to be able to do > > |file| > file := GsFile open: '/tmp/testfile.txt' mode: 'rb' onClient: false. > (SpFileStream new _readingFromGsFile: file) contents > > but this didn't work either. The open:mode:onClient allows to specify a > mode 'rb' which is meant to be read-only/binary mode. But a > > |file| > file := GsFile open: '/tmp/testfile' mode: 'rb' onClient: false. > (SpFileStream new _readingFromGsFile: file) next class > > gives Character back. This fails latest in GRUtf8CodecStream as > > GRUtf8CodecStream>>next: anInteger > ... > [ count < anInteger and: [ stream atEnd not ] ] whileTrue: [ > byte1 := stream next. > unicode := byte1. > (byte1 bitAnd: 16rE0) > ... > > bitAnd: is not available in class Character. > > I can "fix" this if I exchange every ocurrence of "stream next" in > GRUtf8CodecStream>>next: by "stream next asciiValue" but I'm not sure if > this is a good change. > > To me it boils down to the question who is responsible for > "interpreting" something as "binary". The GsFile lets me specify the > mode but it is useless. If this mode is just used to the an fopen call > then this might be a problem. While researching I've found in the > manpage of fopen: > > "The /mode/ string can also include the letter ''b'' either as a last > character or as a character between the characters in any of the > two-character strings described above. This is strictly for > compatibility with C89 and has no effect; the ''b'' is ignored on all > POSIX conforming systems, including Linux. " > > So if GRUtf8CodecStream>>next: assumes to deal with binary data who is > responsible to make it so. Is there a need for a binary data delivering > stream or should GRUtf8CodecStream just convert the stream results to > binary as I did with my change. > > What do you think? Or did I overlook something completely and there is > an easy solution to this? > > thanks, > > Norbert > Norbert, I think there is a place where you can fix this without too many more modifications. GsFile>>next is defined to return a character, regardless of the mode that the file is opened in. GsFile>>nextByte will return the asciiValue of the character, so the "fix" would be to create an SpBinaryFileStream that overrides #next, #nextPut:, #nextPutAll:, etc. to use the byte-base methods for GsFile ...then the rest of your plan should work.... Dale |
In reply to this post by Steve Wart-2
Steve Wart wrote:
>> I can "fix" this if I exchange every ocurrence of "stream next" in GRUtf8CodecStream>>next: by "stream next asciiValue" but I'm not sure if this is a good change. > > Looking at Character>>asInteger it appears the above should work, but > I'm sure there are other pitfalls, so I would do the simplest thing > possible. > > We also need to process multi-byte Unicode strings so I'm hopeful that > we don't have a big job to make this work. > > Does GemStone support indexing and collation on Unicode strings? We support indexing on Strings and DoubleByteStrings (I think we support indexing on QuadByteStrings, but without the encryption optimizations). We also support multiple collation sequences, but I haven't worked with the collation sequences, but I think you are able to install just about any collation sequence you desire:) Dale |
Free forum by Nabble | Edit this page |