File upload - encoding issue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

File upload - encoding issue

Dave
Hi there,
I've an issue with file upload. It's about files with accents in name for example: hellò.doc.
On the debugger I see the WAFile filename is hellò7.doc and it comes from ZnMimePart

It's clearly an ecoding problem but I don't know where to look at.

Can you help me, please?
TIA
 Dave
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Sven Van Caekenberghe-2
Hi Dave,

On 08 Oct 2014, at 15:25, Dave <[hidden email]> wrote:

> Hi there,
> I've an issue with file upload. It's about files with accents in name for
> example: hellò.doc.
> On the debugger I see the WAFile filename is hellò7.doc and it comes from
> ZnMimePart
>
> It's clearly an ecoding problem but I don't know where to look at.

Yes, there seems to be an issue here. It can be observed in WAUploadFunctionalTest.

ZnZincServerAdaptor>>#convertMultipartFileField: creates the WAFile instance by falling back on ZnMimePart>>#fileName and ZnMimePart>>#contents. It seems that these are UTF-8 encoded, this could be fixed easily I guess.

The problem is that I am not 100% sure that this is always the case (i.e. part of the spec) and thus safe to do by default.

Any opinions ?

Sven

PS: Zinc-HTTP-SvenVanCaekenberghe.412 contains a new ZnDefaultServerDelegate>>#formTest3: that deals successfully with this issue.

> Can you help me, please?
> TIA
> Dave
>
>
>
> --
> View this message in context: http://forum.world.st/File-upload-encoding-issue-tp4783446.html
> Sent from the Seaside General mailing list archive at Nabble.com.
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Philippe Marschall
On Wed, Oct 8, 2014 at 5:28 PM, Sven Van Caekenberghe <[hidden email]> wrote:

> Hi Dave,
>
> On 08 Oct 2014, at 15:25, Dave <[hidden email]> wrote:
>
>> Hi there,
>> I've an issue with file upload. It's about files with accents in name for
>> example: hellò.doc.
>> On the debugger I see the WAFile filename is hellò7.doc and it comes from
>> ZnMimePart
>>
>> It's clearly an ecoding problem but I don't know where to look at.
>
> Yes, there seems to be an issue here. It can be observed in WAUploadFunctionalTest.
>
> ZnZincServerAdaptor>>#convertMultipartFileField: creates the WAFile instance by falling back on ZnMimePart>>#fileName and ZnMimePart>>#contents. It seems that these are UTF-8 encoded, this could be fixed easily I guess.

Do you have information in the request header that suggests UTF-8?

Cheers
Philippe
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Sven Van Caekenberghe-2

On 08 Oct 2014, at 18:08, Philippe Marschall <[hidden email]> wrote:

On Wed, Oct 8, 2014 at 5:28 PM, Sven Van Caekenberghe <[hidden email]> wrote:
Hi Dave,

On 08 Oct 2014, at 15:25, Dave <[hidden email]> wrote:

Hi there,
I've an issue with file upload. It's about files with accents in name for
example: hellò.doc.
On the debugger I see the WAFile filename is hellò7.doc and it comes from
ZnMimePart

It's clearly an ecoding problem but I don't know where to look at.

Yes, there seems to be an issue here. It can be observed in WAUploadFunctionalTest.

ZnZincServerAdaptor>>#convertMultipartFileField: creates the WAFile instance by falling back on ZnMimePart>>#fileName and ZnMimePart>>#contents. It seems that these are UTF-8 encoded, this could be fixed easily I guess.

Do you have information in the request header that suggests UTF-8?

Not that I can see, there are no charset=utf-8 anywhere (but one could assume they are the default):


Cheers
Philippe
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Dave
Sven Van Caekenberghe-2 wrote
> Do you have information in the request header that suggests UTF-8?

Not that I can see, there are no charset=utf-8 anywhere (but one could assume they are the default):
Right, I also can't find where utf-8 is set. Any idea on how can I change the charset?
Cheers
 Dave
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Sven Van Caekenberghe-2

On 09 Oct 2014, at 08:46, Dave <[hidden email]> wrote:

> Sven Van Caekenberghe-2 wrote
>>> Do you have information in the request header that suggests UTF-8?
>>
>> Not that I can see, there are no charset=utf-8 anywhere (but one could
>> assume they are the default):
>
> Right, I also can't find where utf-8 is set. Any idea on how can I change
> the charset?

Well, there is an accept-charset="utf-8" in the form, but it does not appear in the submitted form (I only checked one browser). Like I said, I need an informed opinion to help me make a decision here.

As a quick work around, you can convert the strings you get using

 (GRCodec forEncoding: 'utf-8') decode: 'your string'.

I will keep this on my todo list. I hope to come up with a better solution.

Sven

> Cheers
> Dave
>
>
>
> --
> View this message in context: http://forum.world.st/File-upload-encoding-issue-tp4783446p4783606.html
> Sent from the Seaside General mailing list archive at Nabble.com.
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Dave
Sven Van Caekenberghe-2 wrote
On 09 Oct 2014, at 08:46, Dave <[hidden email]> wrote:

> Sven Van Caekenberghe-2 wrote
>>> Do you have information in the request header that suggests UTF-8?
>>
>> Not that I can see, there are no charset=utf-8 anywhere (but one could
>> assume they are the default):
>
> Right, I also can't find where utf-8 is set. Any idea on how can I change
> the charset?

Well, there is an accept-charset="utf-8" in the form, but it does not appear in the submitted form (I only checked one browser). Like I said, I need an informed opinion to help me make a decision here.

As a quick work around, you can convert the strings you get using

 (GRCodec forEncoding: 'utf-8') decode: 'your string'.

I will keep this on my todo list. I hope to come up with a better solution.

Sven

> Cheers
> Dave
fine, I'll convert the string, thanks
 Dave
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Philippe Marschall
In reply to this post by Sven Van Caekenberghe-2
On Thu, Oct 9, 2014 at 9:30 AM, Sven Van Caekenberghe <[hidden email]> wrote:

>
> On 09 Oct 2014, at 08:46, Dave <[hidden email]> wrote:
>
>> Sven Van Caekenberghe-2 wrote
>>>> Do you have information in the request header that suggests UTF-8?
>>>
>>> Not that I can see, there are no charset=utf-8 anywhere (but one could
>>> assume they are the default):
>>
>> Right, I also can't find where utf-8 is set. Any idea on how can I change
>> the charset?
>
> Well, there is an accept-charset="utf-8" in the form, but it does not appear in the submitted form (I only checked one browser). Like I said, I need an informed opinion to help me make a decision here.

The codec on the server adaptor should do the trick. It should match
the page encoding and the accept-charset. Seaside always sets them to
the same value, I did not test which takes precedence in which
browser. I did a quick test and could verify it with UTF-8 and
ISO-8859-1 on Firefox. You can either use the codec on the server
adaptor or ask the codec for the name and do it with the Zinc
adaptors.

Weird things happen in ISO-8859-1 when using code points that do not
fit. Eg Mac OS X uses NFD so German umlauts are two code points with
the second one outside of ISO-8859-1. I did not test UTF-16 or
Shift_JIS.

Cheers
Philippe
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

SmalltalkHub setup questions

JupiterJones
Hi All,

I’m setting up a local SmalltalkHub for my team, helping package distribution, and to learn about Kaliningrad and Amber.

I’m testing by connecting directly to the pharo vm from the browser - I don’t know if there’s a need for WebDAV or not.

It appears to be working but I think I have missed something fundamental :)

1. I register a new user with the seaside UI and it appears in the recently registered list.
2. UI still says “0 registered users”

[Error] TypeError: undefined is not an object (evaluating 'a.length')
        each (jquery-1.7.1.min.js, line 2)
        widget (jQueryUi.js, line 5)
        (anonymous function) (jQueryUi.js, line 5)
        global code (jQueryUi.js, line 5)
[Error] Failed to load resource: the server responded with a status of 500 (Internal Server Error) (users, line 0)
http://localhost:8080/hub/projects/count

3. Login fails - Oops invalid username or password
[Error] Failed to load resource: the server responded with a status of 404 (Not Found) (login, line 0)
http://localhost:8080/hub/login

I tried:

(ShUser selectOne: [ :each | each username ='jupiter' ])
validatePassword: ‘myPassword’

…and it returned false until I changed ShUser-#validatePassword:
from:
validatePassword: aString
        ^self password asInteger = (GRPlatform current secureHashFor: aString)
to:
validatePassword: aString
        ^self password = (GRPlatform current secureHashFor: aString) asString

However, login still fails with NotFound. I tried to put a halt in the login handler however it’s not halting when I hit hub/login.

Could something be caching the old method so my halt is not being seen?

Before I start breaking more things, are there any docs for SmalltalkHub? Is there a version that runs in GemStone rather than using Mongo? (just interested) And finally, for Amber development, is there a defined way to load development and popupHelios()?

Any advice would be much appreciated.

I also noticed from the Mongo log that every connection appears to remain open:
2014-10-10T08:44:18.918+1100 [initandlisten] connection accepted from 127.0.0.1:59924 #1 (1 connection now open)
2014-10-10T08:44:18.922+1100 [initandlisten] connection accepted from 127.0.0.1:59925 #2 (2 connections now open)
2014-10-10T08:44:18.923+1100 [initandlisten] connection accepted from 127.0.0.1:59926 #3 (3 connections now open)
2014-10-10T08:44:18.924+1100 [initandlisten] connection accepted from 127.0.0.1:59927 #4 (4 connections now open)
2014-10-10T08:44:18.992+1100 [initandlisten] connection accepted from 127.0.0.1:59928 #5 (5 connections now open)
2014-10-10T08:44:19.010+1100 [initandlisten] connection accepted from 127.0.0.1:59930 #6 (6 connections now open)
2014-10-10T08:44:19.098+1100 [initandlisten] connection accepted from 127.0.0.1:59931 #7 (7 connections now open)
2014-10-10T08:44:19.182+1100 [initandlisten] connection accepted from 127.0.0.1:59932 #8 (8 connections now open)
2014-10-10T08:44:19.266+1100 [initandlisten] connection accepted from 127.0.0.1:59933 #9 (9 connections now open)
2014-10-10T08:44:19.349+1100 [initandlisten] connection accepted from 127.0.0.1:59934 #10 (10 connections now open)
etc.

Is this correct? After 5 minutes playing around I had hundreds of connections “now open”.

Thanks for your time.

Cheers,

J_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Sven Van Caekenberghe-2
In reply to this post by Philippe Marschall
Hi Philippe, Dave,

I made a couple of changes to Zinc to handle the problem (which basically is: mime parts such as uploaded files embedded in multipart/form-data do not have a charset parameter on their mime types, hence the encoding is not known with absolute certainty) and I think I fixed it (for Zn itself, the default encoding now is UTF-8). I added a specific test (ZnServerTests>>#testFormTest3Unspecified) for this case. Additionally, the filename is now also assumed to be UTF-8 encoded (like a file path).

For the Zn Seaside adaptor, the story was a bit different. The adaptor uses a special Zn option to read everything binary, as Seaside wants to do its own conversions. That option did not extend to mime parts in multipart/form-data. This is now added and the adaptor now works, without altering ZnZincServerAdaptor>>#convertMultipartFileField:

IMHO though, WAUploadFunctionTest is wrong. Basically, the use of ISO-8859-1 is questionable and should be replaced with UTF-8 for current browsers (in the methods #renderDownloadLinksOn: and #renderFileContentsOn:). Then those tests pass for uploaded text files that have non-ascii contents.

The comment in #renderDownloadLinksOn: suggests that this problem (as described in the 1st paragraph) was noted before, the solution or fallback is wrong though, IMHO.

The codec set in the adaptor could indeed be a fallback. I don't know if this can be accessed in regular Seaside code (like in the functional test).

On the other hand, I can't see (and would love an example) where it makes sense, in the 21st century, to not use UTF-8 as a fallback (in case nothing was specified).

In any case, thanks for raising this issue, it helped to improve the code.

Sven

PS: BTW, are there no unit tests that actually stress the functional tests ?

On 09 Oct 2014, at 20:31, Philippe Marschall <[hidden email]> wrote:

> On Thu, Oct 9, 2014 at 9:30 AM, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> On 09 Oct 2014, at 08:46, Dave <[hidden email]> wrote:
>>
>>> Sven Van Caekenberghe-2 wrote
>>>>> Do you have information in the request header that suggests UTF-8?
>>>>
>>>> Not that I can see, there are no charset=utf-8 anywhere (but one could
>>>> assume they are the default):
>>>
>>> Right, I also can't find where utf-8 is set. Any idea on how can I change
>>> the charset?
>>
>> Well, there is an accept-charset="utf-8" in the form, but it does not appear in the submitted form (I only checked one browser). Like I said, I need an informed opinion to help me make a decision here.
>
> The codec on the server adaptor should do the trick. It should match
> the page encoding and the accept-charset. Seaside always sets them to
> the same value, I did not test which takes precedence in which
> browser. I did a quick test and could verify it with UTF-8 and
> ISO-8859-1 on Firefox. You can either use the codec on the server
> adaptor or ask the codec for the name and do it with the Zinc
> adaptors.
>
> Weird things happen in ISO-8859-1 when using code points that do not
> fit. Eg Mac OS X uses NFD so German umlauts are two code points with
> the second one outside of ISO-8859-1. I did not test UTF-16 or
> Shift_JIS.
>
> Cheers
> Philippe
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Philippe Marschall
On Fri, Oct 17, 2014 at 11:25 AM, Sven Van Caekenberghe <[hidden email]> wrote:

> Hi Philippe, Dave,
>
> I made a couple of changes to Zinc to handle the problem (which basically is: mime parts such as uploaded files embedded in multipart/form-data do not have a charset parameter on their mime types, hence the encoding is not known with absolute certainty) and I think I fixed it (for Zn itself, the default encoding now is UTF-8). I added a specific test (ZnServerTests>>#testFormTest3Unspecified) for this case. Additionally, the filename is now also assumed to be UTF-8 encoded (like a file path).
>
> For the Zn Seaside adaptor, the story was a bit different. The adaptor uses a special Zn option to read everything binary, as Seaside wants to do its own conversions. That option did not extend to mime parts in multipart/form-data. This is now added and the adaptor now works, without altering ZnZincServerAdaptor>>#convertMultipartFileField:
>
> IMHO though, WAUploadFunctionTest is wrong. Basically, the use of ISO-8859-1 is questionable and should be replaced with UTF-8 for current browsers (in the methods #renderDownloadLinksOn: and #renderFileContentsOn:). Then those tests pass for uploaded text files that have non-ascii contents.
>
> The comment in #renderDownloadLinksOn: suggests that this problem (as described in the 1st paragraph) was noted before, the solution or fallback is wrong though, IMHO.
>
> The codec set in the adaptor could indeed be a fallback. I don't know if this can be accessed in regular Seaside code (like in the functional test).
>
> On the other hand, I can't see (and would love an example) where it makes sense, in the 21st century, to not use UTF-8 as a fallback (in case nothing was specified).

I'll have a look.

> In any case, thanks for raising this issue, it helped to improve the code.
>
> Sven
>
> PS: BTW, are there no unit tests that actually stress the functional tests ?

No unfortunately there are not. I assume you don't mean unit tests but
functional tests with Selenium or similar.

Cheers
Philippe
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Sven Van Caekenberghe-2

On 17 Oct 2014, at 15:57, Philippe Marschall <[hidden email]> wrote:

>> PS: BTW, are there no unit tests that actually stress the functional tests ?
>
> No unfortunately there are not. I assume you don't mean unit tests but
> functional tests with Selenium or similar.

Well, driving a web browser is one thing, and of course necessary for JavaScript interaction - that is quite complex I guess (I never did it).

But just for rendering and functionality like what we are discussing here, a web client like ZnClient and an XML parser like XMLDOMParser are enough. I did this in my HP-35 tutorial, where web buttons are 'clicked' and the 'display' is read._______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Johan Brichau-2
In reply to this post by Philippe Marschall

On 17 Oct 2014, at 15:57, Philippe Marschall <[hidden email]> wrote:

PS: BTW, are there no unit tests that actually stress the functional tests ?

No unfortunately there are not. I assume you don't mean unit tests but
functional tests with Selenium or similar.

Work started on that: Seaside-Tests-Webdriver-JohanBrichau.1

Johan

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Johan Brichau-2
In reply to this post by Sven Van Caekenberghe-2

On 17 Oct 2014, at 16:05, Sven Van Caekenberghe <[hidden email]> wrote:

Well, driving a web browser is one thing, and of course necessary for JavaScript interaction - that is quite complex I guess (I never did it). 

But just for rendering and functionality like what we are discussing here, a web client like ZnClient and an XML parser like XMLDOMParser are enough. I did this in my HP-35 tutorial, where web buttons are 'clicked' and the 'display' is rea

With Parasol it’s not complex at all.
The old testing tool (SeasideTesting) does roughly the same what you describe: parsing. But imho, it’s actually a lot more difficult codebase.

Johan



_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: File upload - encoding issue

Philippe Marschall
In reply to this post by Sven Van Caekenberghe-2
On Fri, Oct 17, 2014 at 11:25 AM, Sven Van Caekenberghe <[hidden email]> wrote:
> Hi Philippe, Dave,
>
> I made a couple of changes to Zinc to handle the problem (which basically is: mime parts such as uploaded files embedded in multipart/form-data do not have a charset parameter on their mime types, hence the encoding is not known with absolute certainty) and I think I fixed it (for Zn itself, the default encoding now is UTF-8). I added a specific test (ZnServerTests>>#testFormTest3Unspecified) for this case. Additionally, the filename is now also assumed to be UTF-8 encoded (like a file path).
>
> For the Zn Seaside adaptor, the story was a bit different. The adaptor uses a special Zn option to read everything binary, as Seaside wants to do its own conversions.

Not really. Seaside wants a WARequest object (or a subtype). The
adapters in the Seaside repository all do the conversion but that's
because these servers don't support conversion. That is out of
necessity not by contract. Seaside should work totally fine if you
came up with a WARequest object that is build from an already parsed
object.  The same goes for WAUrl and WAFile. You don't have to use the
class side parse methods. If you already have parsed objects it is
totally fine for an adapter to build WAUrl instances with #new and
#addAllToPath: and friends.

> That option did not extend to mime parts in multipart/form-data. This is now added and the adaptor now works, without altering ZnZincServerAdaptor>>#convertMultipartFileField:
>
> IMHO though, WAUploadFunctionTest is wrong. Basically, the use of ISO-8859-1 is questionable and should be replaced with UTF-8 for current browsers (in the methods #renderDownloadLinksOn: and #renderFileContentsOn:). Then those tests pass for uploaded text files that have non-ascii contents.

#renderDownloadLinksOn: could probably we fixed if we always use #rawContents

#renderFileContentsOn: is trickier because we need to know what the on
disk encoding of the file was. That could have been to operating
system default encoding (UTF-8 on MacOS and modern Linux, maybe UTF-16
on Windows) or something else. We could look for a UTF-16 BOM and if
it's missing default to UTF-8.

Cheers
Philippe
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside