Re: (Case INST64237) Should WAFileLibrary convert files in #deployFiles?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: (Case INST64237) Should WAFileLibrary convert files in #deployFiles?

jtuchel

Hi Mariano,


Instantiations Smalltalk Support <[hidden email]> hat am 19. November 2018 um 14:55 geschrieben:

Hi Joachim,

Thanks for keep finding and reporting encoding issues.

I keep enjoying them and like to share the goodness ;-)


Hopefully we will have less and less :)ra

 

You are definitely on track. We've moved to our new server and most of our special character issues are now a thing of the past... Actually this one is the last I currently know of and I am glad about it. I am also glad I know of at least two ways to solve this, even if the fixes don't end up in any shipped product.


I agree and I understand everything you said.

Please keep this sentence for later use. I like it ;-)


 

The main problem here is what I said... as Pharo is UTF-8 by default, Seaside has some assumptions that's the case for everybody everywhere. I dare to say that if you use a different encoding in Pharo (if that's even possible) Seaside will have troubles in there too. 

I was guessing that this is not an issue for Pharo-Seasiders. Good for them ;-)



Anyway... the specific problem is that at the WAServerAdaptor level we KNOW which encoding has been specified and so we delegate to the specified encoder. The usage of the #deployFiles as you said, it's in a completely different moment where you cannot guess anything. You cannot guess which encoding you will specify in the external web server. That's why I am against the change in #write:toFile:inFolder:.

Yes, this was my feeling as well. Not so much the place (write:toFile:inFolder: is in Grease anyways), but because it misses a parameter, namely the charset to convert to. I think we both agree that converting EVERY file to UTF-8 is not a good idea, and there are many imaginable reasons for even not every non-binary file. So my suggestion is only covering my case, not a general case. I wouldn't even dare to call my case an 80% case.



What I propose (and this would work also for the VAST port if upstream Seaside does not want to incorporate it) is to add a new implementations:  #write:toFile:inFolder:encoder: and #deployFilesUsing: anEncoder. Of course, the later passes by the encoder to the former. That way, the developer (that knows and decides which encoding to use in the external web server) can do something like this:

MyFileLibrary deployFilesUsing: *GRCodec named: 'UTF-8')

Sounds much better, and still misses a way to give control over the conversion up to #deployFiles:


Again, we could add these new methods as extension methods of VAST apps. 

What I don't like is the way to distinguish whether to encode or not. The "isString" is weak because at that level of Grease I might provide a text but as ByteArray and that I would like to encode it. I guess I would prefer the FileLibrary to determine wether to encode or not based on the type of file being served. For example, imagine if in #deployFilesUsing: we can do something like this:

deployFiles
    "Write to disk the files that the receiver use to serve as methods.
    The files are stored in a subfolder named like the classname of the receiver in a subfolder of Smalltalk image folder."
    GRPlatform current ensureExistenceOfFolder: self name.
    self fileSelectors do: [ :each | 
        GRPlatform current 
            write: (self perform: each)
            toFile: (self asFilename: each)
            inFolder: self name
            encoding: (( self mimetypeOf: each ) isBinary ifTrue: [nil] ifFalse: [ GRCodec named: 'UTF-8' ]) 
            ]

 

man, the mimetypeOf: was what I was actually looking for. It still wildly guesses on the mimetype and all it has available for it is the filename extension, but that's much better than isString. Absolutely agreed.


That way, on Grease level, if we receive a nil encoding we do nothing. If we receive non nil, we do encode. And we let the caller to decide whether to encode or not rather than deciding on a "isString".

Thoughts?

My gut feeling says we're at the end of the road, because nobody else will care. I would love to see your suggested implementation incorporated in VAST, but don't think it will be accepted by the Seaside devs. OTOH, maybe my assumptioons about this are wrong.

Do you have a channel for pushing changes back? I haven't heard anything about this from anybody on my blog or the mailing list...


Joachim



Mariano

--
Instantiations Smalltalk Support
[hidden email]

-----Original Message-----
From: "[hidden email]" <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Sat, 17 Nov 2018 06:33:43 +0100
To: "Seaside - general discussion" <[hidden email]>
Subject: Should WAFileLibrary convert files in #deployFiles?

>Hi,
>
>I am not sure whether this topic is of interest to anybody not on VA
>Smalltalk. Let me just explain my problem:
>
>VA Smalltalk is not (yet) natively UTF-8. So when you edit code in the
>Smalltalk Browser, it is encoded in the local codepage. In my case this
>is iso-8859-15. Seaside code and file library contents are served using
>a server adaptor. In VAST, this adaptor converts between UTF-8 for
>everything going to or coming from the Browser and the native code page.
>Not sure if this is the case on Pharo, Squeak or VW as well, but I guess so.
>
>So as long as you server the files from the Smalltalk image, special
>characters in, say javascript files, will be converted to UTF.8 when
>delivered to the Browser.
>
>The trouble starts when you use #deployFiles to have the files served by
>a web server (apache, nginx etc.). WAFileLibrary doesn't convert the
>files before writing them to the filesystem.
>
>I hacked GRVASTPlatform to handle this:
>
>write: aStringOrByteArray toFile: aFileNameString inFolder: aFolderString
>    "Writes @aStringOrByteArray to a file named @aFilenameString in
>     the folder @aFolderString."
>
>    |  fullFilePath streamstr|
>
>    fullFilePath := CfsPath fromParts: (Array with: aFolderString with: aFileNameString).
>    stream := fullFilePath newReadWriteStreamBinary: aStringOrByteArray isString not.
>
>  str := aStringOrByteArray isString ifTrue: [aStringOrByteArray
>convertToCodePage: 'UTF-8'] ifFalse: [aStringOrByteArray].   [ stream nextPutAll:str ]
>        ensure: [ stream close ]
>
>The idea is that if teh contents of a method is not binary (like .png or
>.gif etc.), it most likely wants to be in UTF-8 on the file system.
>
>I am not sure, however, if the Grease layer is the right place for this.
>What if other users do not want the files converted to UTF-8. I am also
>not sure if it is okay to convert everything non-binary to UTF-8. Maybe
>there should be a way to tell a Library which codepage to convert to, or
>maybe it is desired to be able to decide for every file individually, or
>maybe even something else.
>
>And, I am not even sure if this is even necessary on other platforms.
>That's why I post here first and ask for comments/discussion.
>
>More on this topic:
>https://joachimtuchel.wordpress.com/2018/11/16/seaside-file-libraries-and-utf-8/
>
>Joachim


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: (Case INST64237) Should WAFileLibrary convert files in #deployFiles?

jtuchel
Hi Mariano,

this looks really good - and works like a charm! It's already in my req maps structure. And it's much better than my half-baked approach.
If this is accepted to go upstream into Seaside, I suggest that deployFiles delegates to this new method, in order to avoid typical DRY problems.

    
deployFiles
    self deployFilesUsing: ("and here comes the trouble: GRCodec has no code to determine the current codepage/encoding in the image... so maybe utf-8 can be defined as a standard parameter here?")


One little comment about method naming:

deployFilesEncodingWhen:with:
deployFilesUsing:

I think the names should be somewhat closer to each other.

Some suggestions for change:

  • deployFilesUsing: and deployFilesUsing:when: (note the changed order of parameters)
  • deployFilesEncoding: and deployFilesEncoding:when:
  • deployFilesUsingCodec: and deployFilesUsingCodec:when:
I think these are better and make clear that the 2-parameter one is just a variant of the other one.

Re: test files. The ones in our code are tons of javascript code with maybe 2 or three Umlauts in them. But they will be added to the DOM and thus are ugly. But for testing you'd have to search a lot of text to spot the umlaut(s)...

I guess a good start for testing would be to implement something like this:

MyFileLibrary>>umlautTest1Js

    ^'console.log("ÄÖÜäöüß")';


Once again: thanks a lot for your help and support! It is great having you as a "cleaner" (remember Jacques the cleaner from Nikita?) in the background ;-)


Joachim




Am 19.11.18 um 17:54 schrieb Instantiations Smalltalk Support:

Hi Joachim,

Please find attached a preview of what I have in mind (map MMP-Incoming). I still need to add comments to every new method as well as trying to add unit tests for that. But it will give you an idea of what I did. You can at least browse changes with whatever you have for 9.1.

There are basically 2 new methods: #deployFilesUsing: aGRCodec  which automatically adds encoding to the selectors that are NOT binary. You can use it like this:

MyFileLibrary deployFilesUsing: ( GRCodec forEncoding: 'utf8' )

In addition, I added #deployFilesEncodingWhen: aBlock with: aGRCodec   (which is called by the previous) in case you want to control when to apply the encoding. Example:

MyFileLibrary

        deployFilesEncodingWhen: [:selector :fileLibrary |
            (fileLibrary mimetypeOf: selector) isBinary not]
        with: aGRCodec

There you can plug whatever closure you want that will be culled with the selector and the file library.

Thoughts?

Mariano

--
Instantiations Smalltalk Support
[hidden email]

-----Original Message-----
From: "Joachim Tuchel" [hidden email]
Reply-To: "Joachim Tuchel" [hidden email]
Date: Mon, 19 Nov 2018 16:09:01 +0100 (CET)
To: "Instantiations Smalltalk Support" [hidden email]
Cc: "Seaside - general discussion" [hidden email]
Subject: Re: (Case INST64237) Should WAFileLibrary convert files in #deployFiles?

>Hi Mariano,
>
>> Instantiations Smalltalk Support [hidden email] hat am 19. November 2018 um 14:55 geschrieben:
>>
>>
>> Hi Joachim,
>>
>> Thanks for keep finding and reporting encoding issues.
>>
>
>I keep enjoying them and like to share the goodness ;-)
>
>>
>> Hopefully we will have less and less :)ra
>>
>>
>>
>
>You are definitely on track. We've moved to our new server and most of our special character issues are now a thing of the past... Actually this one is the last I currently know of and I am glad about it. I am also glad I know of at least two ways to solve this, even if the fixes don't end up in any shipped product.
>
>>
>> I agree and I understand everything you said.
>>
>
>Please keep this sentence for later use. I like it ;-)
>
>>
>>
>>
>> The main problem here is what I said... as Pharo is UTF-8 by default, Seaside has some assumptions that's the case for everybody everywhere. I dare to say that if you use a different encoding in Pharo (if that's even possible) Seaside will have troubles in there too.
>>
>
>I was guessing that this is not an issue for Pharo-Seasiders. Good for them ;-)
>
>>
>> Anyway... the specific problem is that at the WAServerAdaptor level we KNOW which encoding has been specified and so we delegate to the specified encoder. The usage of the #deployFiles as you said, it's in a completely different moment where you cannot guess anything. You cannot guess which encoding you will specify in the external web server. That's why I am against the change in #write:toFile:inFolder:.
>>
>
>Yes, this was my feeling as well. Not so much the place (write:toFile:inFolder: is in Grease anyways), but because it misses a parameter, namely the charset to convert to. I think we both agree that converting EVERY file to UTF-8 is not a good idea, and there are many imaginable reasons for even not every non-binary file. So my suggestion is only covering my case, not a general case. I wouldn't even dare to call my case an 80% case.
>
>>
>> What I propose (and this would work also for the VAST port if upstream Seaside does not want to incorporate it) is to add a new implementations: #write:toFile:inFolder:encoder: and #deployFilesUsing: anEncoder. Of course, the later passes by the encoder to the former. That way, the developer (that knows and decides which encoding to use in the external web server) can do something like this:
>>
>> MyFileLibrary deployFilesUsing: *GRCodec named: 'UTF-8')
>>
>
>Sounds much better, and still misses a way to give control over the conversion up to #deployFiles:
>
>>
>> Again, we could add these new methods as extension methods of VAST apps.
>>
>> What I don't like is the way to distinguish whether to encode or not. The "isString" is weak because at that level of Grease I might provide a text but as ByteArray and that I would like to encode it. I guess I would prefer the FileLibrary to determine wether to encode or not based on the type of file being served. For example, imagine if in #deployFilesUsing: we can do something like this:
>>
>> deployFiles
>> "Write to disk the files that the receiver use to serve as methods.
>> The files are stored in a subfolder named like the classname of the receiver in a subfolder of Smalltalk image folder."
>> GRPlatform current ensureExistenceOfFolder: self name.
>> self fileSelectors do: [ :each |
>> GRPlatform current
>> write: (self perform: each)
>> toFile: (self asFilename: each)
>> inFolder: self name
>> encoding: (( self mimetypeOf: each ) isBinary ifTrue: [nil] ifFalse: [ GRCodec named: 'UTF-8' ])
>> ]
>>
>>
>>
>
>man, the mimetypeOf: was what I was actually looking for. It still wildly guesses on the mimetype and all it has available for it is the filename extension, but that's much better than isString. Absolutely agreed.
>
>>
>> That way, on Grease level, if we receive a nil encoding we do nothing. If we receive non nil, we do encode. And we let the caller to decide whether to encode or not rather than deciding on a "isString".
>>
>> Thoughts?
>>
>
>My gut feeling says we're at the end of the road, because nobody else will care. I would love to see your suggested implementation incorporated in VAST, but don't think it will be accepted by the Seaside devs. OTOH, maybe my assumptioons about this are wrong.
>
>Do you have a channel for pushing changes back? I haven't heard anything about this from anybody on my blog or the mailing list...
>
>Joachim
>
>>
>> Mariano
>>
>> --
>> Instantiations Smalltalk Support
>> [hidden email]
>>
>>



_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: (Case INST64237) Should WAFileLibrary convert files in #deployFiles?

jtuchel
In reply to this post by jtuchel
Hi Mariano,


I also tested it with our production code and it works very well. So I
think it can go on its journey to 9.2. Let it loose and continue your
joruney to new frontiers ;-)


Joachim


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside