Encoding on saving html file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Encoding on saving html file

Dave
Hi folks,

I've an encoding issue when saving an html page. The page has some strings with accented letters like: è or é and so on.
I save the page retrieving the stream doing:

html context document stream contents

but every accented char is wrong encoded. Can you help me?

Thanks
 Dave
Reply | Threaded
Open this post in threaded view
|

Re: Encoding on saving html file

Stephan Eggermont-3
On 17-10-15 10:27, Dave wrote:
> html context document stream

Inspect the stream to see what kind of encoder/codec/converter is used

Stephan

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Encoding on saving html file

Dave
Hi Stephan

The stream is:

'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><title>Title</title><meta http-equiv="Content-Type" content="text/html;charset=utf-8"/><meta http-equiv="Content-Script-Type" content="text/javascript"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/></head><body onload="onLoad()"><br/><h4 style="margin-left:56px"> 17/10/2015</h4><div style="display: block;
overflow: visible;
font-family: Monaco, Consolas, monospace;
font-size: 14px;
line-height: 1.5;
white-space: pre-wrap;
margin-left:56px;
margin-right:36px;
padding:10px;
border-top-style:solid; border-width:1px;">perché</div><br/><br/>'
You can see the last line contains "perché" instead of "perché"
Thanks
Dave

Stephan Eggermont wrote
On 17-10-15 10:27, Dave wrote:
> html context document stream

Inspect the stream to see what kind of encoder/codec/converter is used

Stephan

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Encoding on saving html file

Stephan Eggermont-3
On 17/10/15 14:32, Dave wrote:
> Hi Stephan
>
> The stream is:
>
>
> You can see the last line contains "perché" instead of "perché"

So that's fine then. That's the UTF8 representation

Stephan


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Encoding on saving html file

Dave
But if you save the text of my previous email to an html file and open it, it shows "perché" instead of "perché"

:-(

Dave

Stephan Eggermont wrote
On 17/10/15 14:32, Dave wrote:
> Hi Stephan
>
> The stream is:
>
>
> You can see the last line contains "perché" instead of "perché"

So that's fine then. That's the UTF8 representation

Stephan


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Encoding on saving html file

Stephan Eggermont-3
On 17/10/15 15:46, Dave wrote:
> But if you save the text of my previous email to an html file and open it, it
> shows "perché" instead of "perché"

That depends on how you save it.
You probably want to use an UTF8TextConverter somewhere
It is used in MultiByteBinaryOrTextStream class>>on:encoding:

Stephan


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Encoding on saving html file

Johan Brichau-2
You might want to look at [1] for a related discussion about this issue.

Instead of getting the stream contents from your context and saving that, try the following:

        | fullDocument |
        fullDocument := WAHtmlCanvas builder
                fullDocument: true;
                rootBlock: [:root | root meta contentType: (WAMimeType textHtml charset:'utf-8') ];
                render: [ :html | html text: 'Ñuñoa' ].
         
         '/tmp/test.html' asFileReference writeStreamDo: [ :out | out << fullDocument ].

If you really want to create a text file, how are you saving and viewing that text file once you get the contents from a Seaside document?

Johan

> On 17 Oct 2015, at 16:44, Stephan Eggermont <[hidden email]> wrote:
>
> On 17/10/15 15:46, Dave wrote:
>> But if you save the text of my previous email to an html file and open it, it
>> shows "perché" instead of "perché"
>
> That depends on how you save it.
> You probably want to use an UTF8TextConverter somewhere
> It is used in MultiByteBinaryOrTextStream class>>on:encoding:
>
> Stephan
>
>
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Encoding on saving html file

Dave
In reply to this post by Stephan Eggermont-3
Hi Stephan,
I don't find a hint on MultiByteBinaryOrTextStream but I found a help from a thread of mine (from time to time I have some encoding issue :-) ) http://forum.world.st/File-upload-encoding-issue-tp4783446p4783615.html and Sven's reply helps me again, I decoded the document using this method:
(GRCodec forEncoding: 'utf-8') decode: html context document stream contents.

Anyway if there is a better solution I'll gladly try it
Dave

Stephan Eggermont wrote
On 17/10/15 15:46, Dave wrote:
> But if you save the text of my previous email to an html file and open it, it
> shows "perché" instead of "perché"

That depends on how you save it.
You probably want to use an UTF8TextConverter somewhere
It is used in MultiByteBinaryOrTextStream class>>on:encoding:

Stephan


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Encoding on saving html file

Dave
In reply to this post by Johan Brichau-2
Hi Johan,
With [1] I guess you are linking to this: http://forum.world.st/Accent-in-generated-pages-tp4832873p4832910.html

I tried to change
html text: 'Ñuñoa'
  with
html html: contextFromMyDocument 
 and save it as you suggest, but it doesn't work, text contains perché

I tried something different as Sven suggested some time ago: http://forum.world.st/File-upload-encoding-issue-tp4783446p4783615.html i.e
 (GRCodec forEncoding: 'utf-8') decode: html context document stream contents
 and it works, but if you can explain how to make your example working I'll be glad.

Thanks
Dave


Johan Brichau-2 wrote
You might want to look at [1] for a related discussion about this issue.

Instead of getting the stream contents from your context and saving that, try the following:

        | fullDocument |
        fullDocument := WAHtmlCanvas builder
                fullDocument: true;
                rootBlock: [:root | root meta contentType: (WAMimeType textHtml charset:'utf-8') ];
                render: [ :html | html text: 'Ñuñoa' ].
         
         '/tmp/test.html' asFileReference writeStreamDo: [ :out | out << fullDocument ].

If you really want to create a text file, how are you saving and viewing that text file once you get the contents from a Seaside document?

Johan

> On 17 Oct 2015, at 16:44, Stephan Eggermont <[hidden email]> wrote:
>
> On 17/10/15 15:46, Dave wrote:
>> But if you save the text of my previous email to an html file and open it, it
>> shows "perché" instead of "perché"
>
> That depends on how you save it.
> You probably want to use an UTF8TextConverter somewhere
> It is used in MultiByteBinaryOrTextStream class>>on:encoding:
>
> Stephan
>
>
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Johan Brichau-2 wrote
You might want to look at [1] for a related discussion about this issue.

Instead of getting the stream contents from your context and saving that, try the following:

        | fullDocument |
        fullDocument := WAHtmlCanvas builder
                fullDocument: true;
                rootBlock: [:root | root meta contentType: (WAMimeType textHtml charset:'utf-8') ];
                render: [ :html | html text: 'Ñuñoa' ].
         
         '/tmp/test.html' asFileReference writeStreamDo: [ :out | out << fullDocument ].

If you really want to create a text file, how are you saving and viewing that text file once you get the contents from a Seaside document?

Johan

> On 17 Oct 2015, at 16:44, Stephan Eggermont <[hidden email]> wrote:
>
> On 17/10/15 15:46, Dave wrote:
>> But if you save the text of my previous email to an html file and open it, it
>> shows "perché" instead of "perché"
>
> That depends on how you save it.
> You probably want to use an UTF8TextConverter somewhere
> It is used in MultiByteBinaryOrTextStream class>>on:encoding:
>
> Stephan
>
>
> _______________________________________________
> seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Encoding on saving html file

Johan Brichau-2

On 17 Oct 2015, at 22:46, Dave <[hidden email]> wrote:

Sorry, forgot to include the link :/

I tried to change   with  and save it as you suggest, but it doesn't work,
text contains perché

Which, as Stephan noted, is the UTF8 encoding for ‘perché’. You can test it:

(GRCodec forEncoding: 'utf8') decode: 'perché’

So, how our opening the text file? And how are you saving it to disk?
Because if you save it as bytes to a file, it will be correctly utf8 encoded.

Johan


I tried something different as Sven suggested some time ago:
http://forum.world.st/File-upload-encoding-issue-tp4783446p4783615.html i.e 
and it works, but if you can explain how to make your example working I'll
be glad.

Thanks
Dave



Johan Brichau-2 wrote
You might want to look at [1] for a related discussion about this issue.

Instead of getting the stream contents from your context and saving that,
try the following:

| fullDocument |
fullDocument := WAHtmlCanvas builder 
fullDocument: true;
rootBlock: [:root | root meta contentType: (WAMimeType textHtml
charset:'utf-8') ];
render: [ :html | html text: 'Ñuñoa' ].
 
 '/tmp/test.html' asFileReference writeStreamDo: [ :out | out <<
fullDocument ].

If you really want to create a text file, how are you saving and viewing
that text file once you get the contents from a Seaside document?

Johan

On 17 Oct 2015, at 16:44, Stephan Eggermont &lt;

stephan@

&gt; wrote:

On 17/10/15 15:46, Dave wrote:
But if you save the text of my previous email to an html file and open
it, it
shows "perché" instead of "perché"

That depends on how you save it.
You probably want to use an UTF8TextConverter somewhere
It is used in MultiByteBinaryOrTextStream class>>on:encoding:

Stephan


_______________________________________________
seaside mailing list


seaside@.squeakfoundation

http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

_______________________________________________
seaside mailing list

seaside@.squeakfoundation

http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside


Johan Brichau-2 wrote
You might want to look at [1] for a related discussion about this issue.

Instead of getting the stream contents from your context and saving that,
try the following:

| fullDocument |
fullDocument := WAHtmlCanvas builder 
fullDocument: true;
rootBlock: [:root | root meta contentType: (WAMimeType textHtml
charset:'utf-8') ];
render: [ :html | html text: 'Ñuñoa' ].
 
 '/tmp/test.html' asFileReference writeStreamDo: [ :out | out <<
fullDocument ].

If you really want to create a text file, how are you saving and viewing
that text file once you get the contents from a Seaside document?

Johan

On 17 Oct 2015, at 16:44, Stephan Eggermont &lt;

stephan@

&gt; wrote:

On 17/10/15 15:46, Dave wrote:
But if you save the text of my previous email to an html file and open
it, it
shows "perché" instead of "perché"

That depends on how you save it.
You probably want to use an UTF8TextConverter somewhere
It is used in MultiByteBinaryOrTextStream class>>on:encoding:

Stephan


_______________________________________________
seaside mailing list


seaside@.squeakfoundation

http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

_______________________________________________
seaside mailing list

seaside@.squeakfoundation

http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside





--
View this message in context: http://forum.world.st/Encoding-on-saving-html-file-tp4856116p4856230.html
Sent from the Seaside General mailing list archive at Nabble.com.
_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside


_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Encoding on saving html file

Dave
Hi Johan,

You are right, if I save the file as binary everything is fine.

       
htmlFilename := 'test.html'.
	stream := FileStream forceNewFileNamed: htmlFilename.
	stream binary.
	[ stream nextPutAll:  html context document stream contents ]
		ensure: [ stream close ].



When I tried your snipped it didn't work out because it didn't save as binary (or at least I couldn't convert it to binary) look here:

 
fullDocument := WAHtmlCanvas builder 
                fullDocument: true; 
                rootBlock: [:root | root meta contentType: (WAMimeType textHtml charset:'utf-8') ]; 
                render: [ :r | r html: html context document stream contents  ]. 
          
         '/tmp/test.html' asFileReference writeStreamDo: [ :out | (out << fullDocument)  ]. 

Dave

Johan Brichau-2 wrote
> On 17 Oct 2015, at 22:46, Dave <[hidden email]> wrote:
>
> With [1] I guess you are linking to this:
> http://forum.world.st/Accent-in-generated-pages-tp4832873p4832910.html <http://forum.world.st/Accent-in-generated-pages-tp4832873p4832910.html>

Sorry, forgot to include the link :/

> I tried to change   with  and save it as you suggest, but it doesn't work,
> text contains perché

Which, as Stephan noted, is the UTF8 encoding for ‘perché’. You can test it:

(GRCodec forEncoding: 'utf8') decode: 'perché’

So, how our opening the text file? And how are you saving it to disk?
Because if you save it as bytes to a file, it will be correctly utf8 encoded.

Johan

>
> I tried something different as Sven suggested some time ago:
> http://forum.world.st/File-upload-encoding-issue-tp4783446p4783615.html <http://forum.world.st/File-upload-encoding-issue-tp4783446p4783615.html> i.e
> and it works, but if you can explain how to make your example working I'll
> be glad.
>
> Thanks
> Dave
>
>
>
> Johan Brichau-2 wrote
>> You might want to look at [1] for a related discussion about this issue.
>>
>> Instead of getting the stream contents from your context and saving that,
>> try the following:
>>
>> | fullDocument |
>> fullDocument := WAHtmlCanvas builder
>> fullDocument: true;
>> rootBlock: [:root | root meta contentType: (WAMimeType textHtml
>> charset:'utf-8') ];
>> render: [ :html | html text: 'Ñuñoa' ].
>>
>> '/tmp/test.html' asFileReference writeStreamDo: [ :out | out <<
>> fullDocument ].
>>
>> If you really want to create a text file, how are you saving and viewing
>> that text file once you get the contents from a Seaside document?
>>
>> Johan
>>
>>> On 17 Oct 2015, at 16:44, Stephan Eggermont <
>
>> stephan@
>
>> > wrote:
>>>
>>> On 17/10/15 15:46, Dave wrote:
>>>> But if you save the text of my previous email to an html file and open
>>>> it, it
>>>> shows "perché" instead of "perché"
>>>
>>> That depends on how you save it.
>>> You probably want to use an UTF8TextConverter somewhere
>>> It is used in MultiByteBinaryOrTextStream class>>on:encoding:
>>>
>>> Stephan
>>>
>>>
>>> _______________________________________________
>>> seaside mailing list
>>>
>
>> seaside@.squeakfoundation
>
>>> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside <http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside>
>>
>> _______________________________________________
>> seaside mailing list
>
>> seaside@.squeakfoundation
>
>> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside <http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside>
>
>
> Johan Brichau-2 wrote
>> You might want to look at [1] for a related discussion about this issue.
>>
>> Instead of getting the stream contents from your context and saving that,
>> try the following:
>>
>> | fullDocument |
>> fullDocument := WAHtmlCanvas builder
>> fullDocument: true;
>> rootBlock: [:root | root meta contentType: (WAMimeType textHtml
>> charset:'utf-8') ];
>> render: [ :html | html text: 'Ñuñoa' ].
>>
>> '/tmp/test.html' asFileReference writeStreamDo: [ :out | out <<
>> fullDocument ].
>>
>> If you really want to create a text file, how are you saving and viewing
>> that text file once you get the contents from a Seaside document?
>>
>> Johan
>>
>>> On 17 Oct 2015, at 16:44, Stephan Eggermont <
>
>> stephan@
>
>> > wrote:
>>>
>>> On 17/10/15 15:46, Dave wrote:
>>>> But if you save the text of my previous email to an html file and open
>>>> it, it
>>>> shows "perché" instead of "perché"
>>>
>>> That depends on how you save it.
>>> You probably want to use an UTF8TextConverter somewhere
>>> It is used in MultiByteBinaryOrTextStream class>>on:encoding:
>>>
>>> Stephan
>>>
>>>
>>> _______________________________________________
>>> seaside mailing list
>>>
>
>> seaside@.squeakfoundation
>
>>> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside <http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside>
>>
>> _______________________________________________
>> seaside mailing list
>
>> seaside@.squeakfoundation
>
>> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside <http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside>
>
>
>
>
>
> --
> View this message in context: http://forum.world.st/Encoding-on-saving-html-file-tp4856116p4856230.html <http://forum.world.st/Encoding-on-saving-html-file-tp4856116p4856230.html>
> Sent from the Seaside General mailing list archive at Nabble.com <http://nabble.com/>.
> _______________________________________________
> seaside mailing list
> [hidden email] <mailto:[hidden email]>
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside <http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside>

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside