Great job with IMAP interface

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Great job with IMAP interface

Louis LaBrunda
Hi All,

I have an old program that reads the email headers from various email accounts and displays them in two lists, those that it thinks I want to delete and those that I want to keep.  The program was written a long time ago with Totally Objects Socket Set.

Socket Set only does POP3, so I decided to upgrade the program to IMAP.  I really like the Instantiations IMAP interface.

With POP3, once an email is deleted, it is gone.  With IMAP it is flagged as deleted but remains until it is expunged, so it can be recovered.  I like this because every once in a while I would accidently delete emails before having read them into my email client.  With IMAP I can delete them and undelete them before they are gone for good.  There is one problem with that.  My email client (Forte Agent, which is old but I like it) reads via POP3 and sees and reads the "deleted" email.

IMAP to the rescue.  With IMAP I can move emails to another mailbox.  IMAP really doesn't have a move command but it does have a copy command.  Instantiations made this easy with their #move:to: method that does a copy and then deletes (flags and expunges) the email from the original mailbox.  It saved me the trouble of writing it.

If I mess up, I can recover emails from the archive mailbox.  At the end of the day, when I'm sure I have read all the emails I'm interested in, I delete them for good from the archive mailbox.

Thanks guys, for a nice job.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/525efd07-22ac-45df-93e0-895207d2d11fn%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Seth Berman
Thanks Lou,

I'll admit, SMTP was a lot more enjoyable to write than IMAP.
The volume of RFCs to get a handle on for IMAP was considerable because we ended up supporting a lot of extensions.
But it was interesting to do things like IMAP IDLE support.

Many thanks

- Seth


On Saturday, March 27, 2021 at 10:43:14 AM UTC-4 [hidden email] wrote:
Hi All,

I have an old program that reads the email headers from various email accounts and displays them in two lists, those that it thinks I want to delete and those that I want to keep.  The program was written a long time ago with Totally Objects Socket Set.

Socket Set only does POP3, so I decided to upgrade the program to IMAP.  I really like the Instantiations IMAP interface.

With POP3, once an email is deleted, it is gone.  With IMAP it is flagged as deleted but remains until it is expunged, so it can be recovered.  I like this because every once in a while I would accidently delete emails before having read them into my email client.  With IMAP I can delete them and undelete them before they are gone for good.  There is one problem with that.  My email client (Forte Agent, which is old but I like it) reads via POP3 and sees and reads the "deleted" email.

IMAP to the rescue.  With IMAP I can move emails to another mailbox.  IMAP really doesn't have a move command but it does have a copy command.  Instantiations made this easy with their #move:to: method that does a copy and then deletes (flags and expunges) the email from the original mailbox.  It saved me the trouble of writing it.

If I mess up, I can recover emails from the archive mailbox.  At the end of the day, when I'm sure I have read all the emails I'm interested in, I delete them for good from the archive mailbox.

Thanks guys, for a nice job.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/d3efc433-2690-49e2-890e-c143c57ff893n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Louis LaBrunda
Hi Seth,

I have a question.  Some of the email have subjects that look like this:

'=?utf-8?Q?33=20Items=20on=20Sale=20at=20up=20to=2040%=20off=21?='

and from>personalName that look like this:

'=?utf-8?Q?MPJA.com=20=2D=20Email=20Specials?='

They look like html encodes string?  Any idea what I need to do to convert them to cleaner looking text?

Lou 


On Saturday, March 27, 2021 at 11:16:37 AM UTC-4 Seth Berman wrote:
Thanks Lou,

I'll admit, SMTP was a lot more enjoyable to write than IMAP.
The volume of RFCs to get a handle on for IMAP was considerable because we ended up supporting a lot of extensions.
But it was interesting to do things like IMAP IDLE support.

Many thanks

- Seth


On Saturday, March 27, 2021 at 10:43:14 AM UTC-4 [hidden email] wrote:
Hi All,

I have an old program that reads the email headers from various email accounts and displays them in two lists, those that it thinks I want to delete and those that I want to keep.  The program was written a long time ago with Totally Objects Socket Set.

Socket Set only does POP3, so I decided to upgrade the program to IMAP.  I really like the Instantiations IMAP interface.

With POP3, once an email is deleted, it is gone.  With IMAP it is flagged as deleted but remains until it is expunged, so it can be recovered.  I like this because every once in a while I would accidently delete emails before having read them into my email client.  With IMAP I can delete them and undelete them before they are gone for good.  There is one problem with that.  My email client (Forte Agent, which is old but I like it) reads via POP3 and sees and reads the "deleted" email.

IMAP to the rescue.  With IMAP I can move emails to another mailbox.  IMAP really doesn't have a move command but it does have a copy command.  Instantiations made this easy with their #move:to: method that does a copy and then deletes (flags and expunges) the email from the original mailbox.  It saved me the trouble of writing it.

If I mess up, I can recover emails from the archive mailbox.  At the end of the day, when I'm sure I have read all the emails I'm interested in, I delete them for good from the archive mailbox.

Thanks guys, for a nice job.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/105939be-ead6-4f51-b5cb-f915f9fbfe64n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Seth Berman
Hi Lou,

There are probably lots of articles on how specifically to handle these, but this is from RFC 1342: Representation of Non-ASCII Text in Internet Message Headers

The grammar is defined lower in the doc, but from your example we can see a "Quoted-Printable" encoding  (since the encoding is a Q).
You will want to refer to the grammar for encoded-text which leads you to see that =20 is a <space> and =21 is an <exclamation mark> in UTF-8

Therefore, the substitution would look like:
encoded-word = "=" "?" charset "?" encoding "?" encoded-text "?" "="                                                                     ---->
                          =  "=" "?" utf-8 "?" Q "?" 20Items=20on=20Sale=20at=20up=20to=2040%=20off=21 "?" "="      ---->
                          =  "=" "?" utf-8 "?" Q "?" Items on Sale at up to 40% off! "?" "="                                                        ---->
                          ...


- Seth

On Sunday, March 28, 2021 at 12:08:43 PM UTC-4 [hidden email] wrote:
Hi Seth,

I have a question.  Some of the email have subjects that look like this:

'=?utf-8?Q?33=20Items=20on=20Sale=20at=20up=20to=2040%=20off=21?='

and from>personalName that look like this:

'=?utf-8?Q?MPJA.com=20=2D=20Email=20Specials?='

They look like html encodes string?  Any idea what I need to do to convert them to cleaner looking text?

Lou 


On Saturday, March 27, 2021 at 11:16:37 AM UTC-4 Seth Berman wrote:
Thanks Lou,

I'll admit, SMTP was a lot more enjoyable to write than IMAP.
The volume of RFCs to get a handle on for IMAP was considerable because we ended up supporting a lot of extensions.
But it was interesting to do things like IMAP IDLE support.

Many thanks

- Seth


On Saturday, March 27, 2021 at 10:43:14 AM UTC-4 [hidden email] wrote:
Hi All,

I have an old program that reads the email headers from various email accounts and displays them in two lists, those that it thinks I want to delete and those that I want to keep.  The program was written a long time ago with Totally Objects Socket Set.

Socket Set only does POP3, so I decided to upgrade the program to IMAP.  I really like the Instantiations IMAP interface.

With POP3, once an email is deleted, it is gone.  With IMAP it is flagged as deleted but remains until it is expunged, so it can be recovered.  I like this because every once in a while I would accidently delete emails before having read them into my email client.  With IMAP I can delete them and undelete them before they are gone for good.  There is one problem with that.  My email client (Forte Agent, which is old but I like it) reads via POP3 and sees and reads the "deleted" email.

IMAP to the rescue.  With IMAP I can move emails to another mailbox.  IMAP really doesn't have a move command but it does have a copy command.  Instantiations made this easy with their #move:to: method that does a copy and then deletes (flags and expunges) the email from the original mailbox.  It saved me the trouble of writing it.

If I mess up, I can recover emails from the archive mailbox.  At the end of the day, when I'm sure I have read all the emails I'm interested in, I delete them for good from the archive mailbox.

Thanks guys, for a nice job.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/2bf6083c-5622-4808-94a4-cafb6c1fe14fn%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

jtuchel
Lou,

as Seth said, this is a very common encoding. My research pointed me towars RFC2047, but I must admit that I have my troubles seeing the forest among all those trees when it comes to RFCs.
For the part between "=?UTF-8?Q?" and "?=" you can use my QuotedPrintableCoder from VASTGoodies. I also have  an unfinished version of an RFC2047 code in my private repository for the surrounding part which also uses the Base64Encoder for cases in which the encoding is B instead of Q. I didn't find the time to do the encoding part yet, however. Shouldn't be too much work, but I had other things to do and so I moved on before it was finished ...  I'll contact you in private about this.


Joachim




Seth Berman schrieb am Sonntag, 28. März 2021 um 21:20:02 UTC+2:
Hi Lou,

There are probably lots of articles on how specifically to handle these, but this is from RFC 1342: Representation of Non-ASCII Text in Internet Message Headers

The grammar is defined lower in the doc, but from your example we can see a "Quoted-Printable" encoding  (since the encoding is a Q).
You will want to refer to the grammar for encoded-text which leads you to see that =20 is a <space> and =21 is an <exclamation mark> in UTF-8

Therefore, the substitution would look like:
encoded-word = "=" "?" charset "?" encoding "?" encoded-text "?" "="                                                                     ---->
                          =  "=" "?" utf-8 "?" Q "?" 20Items=20on=20Sale=20at=20up=20to=2040%=20off=21 "?" "="      ---->
                          =  "=" "?" utf-8 "?" Q "?" Items on Sale at up to 40% off! "?" "="                                                        ---->
                          ...


- Seth

On Sunday, March 28, 2021 at 12:08:43 PM UTC-4 [hidden email] wrote:
Hi Seth,

I have a question.  Some of the email have subjects that look like this:

'=?utf-8?Q?33=20Items=20on=20Sale=20at=20up=20to=2040%=20off=21?='

and from>personalName that look like this:

'=?utf-8?Q?MPJA.com=20=2D=20Email=20Specials?='

They look like html encodes string?  Any idea what I need to do to convert them to cleaner looking text?

Lou 


On Saturday, March 27, 2021 at 11:16:37 AM UTC-4 Seth Berman wrote:
Thanks Lou,

I'll admit, SMTP was a lot more enjoyable to write than IMAP.
The volume of RFCs to get a handle on for IMAP was considerable because we ended up supporting a lot of extensions.
But it was interesting to do things like IMAP IDLE support.

Many thanks

- Seth


On Saturday, March 27, 2021 at 10:43:14 AM UTC-4 [hidden email] wrote:
Hi All,

I have an old program that reads the email headers from various email accounts and displays them in two lists, those that it thinks I want to delete and those that I want to keep.  The program was written a long time ago with Totally Objects Socket Set.

Socket Set only does POP3, so I decided to upgrade the program to IMAP.  I really like the Instantiations IMAP interface.

With POP3, once an email is deleted, it is gone.  With IMAP it is flagged as deleted but remains until it is expunged, so it can be recovered.  I like this because every once in a while I would accidently delete emails before having read them into my email client.  With IMAP I can delete them and undelete them before they are gone for good.  There is one problem with that.  My email client (Forte Agent, which is old but I like it) reads via POP3 and sees and reads the "deleted" email.

IMAP to the rescue.  With IMAP I can move emails to another mailbox.  IMAP really doesn't have a move command but it does have a copy command.  Instantiations made this easy with their #move:to: method that does a copy and then deletes (flags and expunges) the email from the original mailbox.  It saved me the trouble of writing it.

If I mess up, I can recover emails from the archive mailbox.  At the end of the day, when I'm sure I have read all the emails I'm interested in, I delete them for good from the archive mailbox.

Thanks guys, for a nice job.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/a73627e6-5628-4475-960e-292d8cc14bd7n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Seth Berman
Hi Joachim,

I could be looking at a more dated version...I didn't think to follow the RFC update trail.
Basically at the top of the rfcs...such as RFC 1342, you can see "Obsoleted by: 1522"
Then if you visit RFC 1522, you can see the same basic grammar in "2. Syntax of Encoded-Words"
But, we see that this was obsoleted by a whole host of documents...one of them being RFC 2047 which contains that grammar in "2. Syntax of encoded-words".

It basically looks the same, but I would certainly use the newest one.

- Seth

On Monday, March 29, 2021 at 2:09:46 AM UTC-4 [hidden email] wrote:
Lou,

as Seth said, this is a very common encoding. My research pointed me towars RFC2047, but I must admit that I have my troubles seeing the forest among all those trees when it comes to RFCs.
For the part between "=?UTF-8?Q?" and "?=" you can use my QuotedPrintableCoder from VASTGoodies. I also have  an unfinished version of an RFC2047 code in my private repository for the surrounding part which also uses the Base64Encoder for cases in which the encoding is B instead of Q. I didn't find the time to do the encoding part yet, however. Shouldn't be too much work, but I had other things to do and so I moved on before it was finished ...  I'll contact you in private about this.


Joachim




Seth Berman schrieb am Sonntag, 28. März 2021 um 21:20:02 UTC+2:
Hi Lou,

There are probably lots of articles on how specifically to handle these, but this is from RFC 1342: Representation of Non-ASCII Text in Internet Message Headers

The grammar is defined lower in the doc, but from your example we can see a "Quoted-Printable" encoding  (since the encoding is a Q).
You will want to refer to the grammar for encoded-text which leads you to see that =20 is a <space> and =21 is an <exclamation mark> in UTF-8

Therefore, the substitution would look like:
encoded-word = "=" "?" charset "?" encoding "?" encoded-text "?" "="                                                                     ---->
                          =  "=" "?" utf-8 "?" Q "?" 20Items=20on=20Sale=20at=20up=20to=2040%=20off=21 "?" "="      ---->
                          =  "=" "?" utf-8 "?" Q "?" Items on Sale at up to 40% off! "?" "="                                                        ---->
                          ...


- Seth

On Sunday, March 28, 2021 at 12:08:43 PM UTC-4 [hidden email] wrote:
Hi Seth,

I have a question.  Some of the email have subjects that look like this:

'=?utf-8?Q?33=20Items=20on=20Sale=20at=20up=20to=2040%=20off=21?='

and from>personalName that look like this:

'=?utf-8?Q?MPJA.com=20=2D=20Email=20Specials?='

They look like html encodes string?  Any idea what I need to do to convert them to cleaner looking text?

Lou 


On Saturday, March 27, 2021 at 11:16:37 AM UTC-4 Seth Berman wrote:
Thanks Lou,

I'll admit, SMTP was a lot more enjoyable to write than IMAP.
The volume of RFCs to get a handle on for IMAP was considerable because we ended up supporting a lot of extensions.
But it was interesting to do things like IMAP IDLE support.

Many thanks

- Seth


On Saturday, March 27, 2021 at 10:43:14 AM UTC-4 [hidden email] wrote:
Hi All,

I have an old program that reads the email headers from various email accounts and displays them in two lists, those that it thinks I want to delete and those that I want to keep.  The program was written a long time ago with Totally Objects Socket Set.

Socket Set only does POP3, so I decided to upgrade the program to IMAP.  I really like the Instantiations IMAP interface.

With POP3, once an email is deleted, it is gone.  With IMAP it is flagged as deleted but remains until it is expunged, so it can be recovered.  I like this because every once in a while I would accidently delete emails before having read them into my email client.  With IMAP I can delete them and undelete them before they are gone for good.  There is one problem with that.  My email client (Forte Agent, which is old but I like it) reads via POP3 and sees and reads the "deleted" email.

IMAP to the rescue.  With IMAP I can move emails to another mailbox.  IMAP really doesn't have a move command but it does have a copy command.  Instantiations made this easy with their #move:to: method that does a copy and then deletes (flags and expunges) the email from the original mailbox.  It saved me the trouble of writing it.

If I mess up, I can recover emails from the archive mailbox.  At the end of the day, when I'm sure I have read all the emails I'm interested in, I delete them for good from the archive mailbox.

Thanks guys, for a nice job.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/53e8bbaa-2213-46d3-84c1-96e0522c9bb3n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

jtuchel
Hi Seth,


I think I remember reading a comment of yours about the fun of reading RFCs... Imagine you are not a native speaker and try again ;-)

I don't think the QuotedPrintable stuff has changed much over time. The problem with its documentation to me seems to be the fact that it is described in multiple RFCs and I think I even saw some tiny differences between them. I initially implemented the QuotedPrintableCoder to read .VCS and .ics files, not knowing that it's used in many more contexts.
The bigger problem to me was that it is embedded in other encodings like the one described in 2047. And going through RFCs increasingly gives you the feeling no matter what you implement, there will be ever more RFCs putting this one detail into more and more contexts. It feels like rolling a rock up an endless chain of hills... you always fear whatever you implemented so far, it's not good enough for more purposes than the one or two you use it in...

So I do have a starting point of an RFC2047 coder (it does decoding, but no encoding yet, because I didn't need it yet), but it is far from complete, so I haven't published it yet.... I sent it to Lou to see whether it does work for his purposes as well and who knows, maybe he will do a bit more testing and maybe even add some functionality.
AFAIK, Pharo 9 (maybe it was even Pharo 8, don't remember) ships with an RFC2047 code based on / inherited from my my QuotedPrintableCoder, so looking into "backporting" this stuff to VAST is an even better option...

Joachim




Seth Berman schrieb am Montag, 29. März 2021 um 15:21:33 UTC+2:
Hi Joachim,

I could be looking at a more dated version...I didn't think to follow the RFC update trail.
Basically at the top of the rfcs...such as RFC 1342, you can see "Obsoleted by: 1522"
Then if you visit RFC 1522, you can see the same basic grammar in "2. Syntax of Encoded-Words"
But, we see that this was obsoleted by a whole host of documents...one of them being RFC 2047 which contains that grammar in "2. Syntax of encoded-words".

It basically looks the same, but I would certainly use the newest one.

- Seth

On Monday, March 29, 2021 at 2:09:46 AM UTC-4 [hidden email] wrote:
Lou,

as Seth said, this is a very common encoding. My research pointed me towars RFC2047, but I must admit that I have my troubles seeing the forest among all those trees when it comes to RFCs.
For the part between "=?UTF-8?Q?" and "?=" you can use my QuotedPrintableCoder from VASTGoodies. I also have  an unfinished version of an RFC2047 code in my private repository for the surrounding part which also uses the Base64Encoder for cases in which the encoding is B instead of Q. I didn't find the time to do the encoding part yet, however. Shouldn't be too much work, but I had other things to do and so I moved on before it was finished ...  I'll contact you in private about this.


Joachim




Seth Berman schrieb am Sonntag, 28. März 2021 um 21:20:02 UTC+2:
Hi Lou,

There are probably lots of articles on how specifically to handle these, but this is from RFC 1342: Representation of Non-ASCII Text in Internet Message Headers

The grammar is defined lower in the doc, but from your example we can see a "Quoted-Printable" encoding  (since the encoding is a Q).
You will want to refer to the grammar for encoded-text which leads you to see that =20 is a <space> and =21 is an <exclamation mark> in UTF-8

Therefore, the substitution would look like:
encoded-word = "=" "?" charset "?" encoding "?" encoded-text "?" "="                                                                     ---->
                          =  "=" "?" utf-8 "?" Q "?" 20Items=20on=20Sale=20at=20up=20to=2040%=20off=21 "?" "="      ---->
                          =  "=" "?" utf-8 "?" Q "?" Items on Sale at up to 40% off! "?" "="                                                        ---->
                          ...


- Seth

On Sunday, March 28, 2021 at 12:08:43 PM UTC-4 [hidden email] wrote:
Hi Seth,

I have a question.  Some of the email have subjects that look like this:

'=?utf-8?Q?33=20Items=20on=20Sale=20at=20up=20to=2040%=20off=21?='

and from>personalName that look like this:

'=?utf-8?Q?MPJA.com=20=2D=20Email=20Specials?='

They look like html encodes string?  Any idea what I need to do to convert them to cleaner looking text?

Lou 


On Saturday, March 27, 2021 at 11:16:37 AM UTC-4 Seth Berman wrote:
Thanks Lou,

I'll admit, SMTP was a lot more enjoyable to write than IMAP.
The volume of RFCs to get a handle on for IMAP was considerable because we ended up supporting a lot of extensions.
But it was interesting to do things like IMAP IDLE support.

Many thanks

- Seth


On Saturday, March 27, 2021 at 10:43:14 AM UTC-4 [hidden email] wrote:
Hi All,

I have an old program that reads the email headers from various email accounts and displays them in two lists, those that it thinks I want to delete and those that I want to keep.  The program was written a long time ago with Totally Objects Socket Set.

Socket Set only does POP3, so I decided to upgrade the program to IMAP.  I really like the Instantiations IMAP interface.

With POP3, once an email is deleted, it is gone.  With IMAP it is flagged as deleted but remains until it is expunged, so it can be recovered.  I like this because every once in a while I would accidently delete emails before having read them into my email client.  With IMAP I can delete them and undelete them before they are gone for good.  There is one problem with that.  My email client (Forte Agent, which is old but I like it) reads via POP3 and sees and reads the "deleted" email.

IMAP to the rescue.  With IMAP I can move emails to another mailbox.  IMAP really doesn't have a move command but it does have a copy command.  Instantiations made this easy with their #move:to: method that does a copy and then deletes (flags and expunges) the email from the original mailbox.  It saved me the trouble of writing it.

If I mess up, I can recover emails from the archive mailbox.  At the end of the day, when I'm sure I have read all the emails I'm interested in, I delete them for good from the archive mailbox.

Thanks guys, for a nice job.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/c1e8a425-aceb-4413-9334-b86c00b06b89n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Louis LaBrunda
Hi Seth,  Joachim and everyone,

Just for fun I wrote a RFC2047 decoder.  I had an idea for a way of converting this stuff that I decided to play with.  Years ago, when I did conversions like this (I don't remember what was being converted, that's how long ago it was) I used a thing called "state diagrams".  State Diagrams are a way of drawing the steps (states) a machine goes through to perform a process based upon inputs.  They are kinda like a flow chart.  I think they were used to design things like vending machines before integrated circuits were cheep and readily available.  I never used them for machines myself but borrowed the idea for conversions.  They are drawn with circles that represent the current state and lines that show input and what state the machine goes to next.  States can loop back to themselves.

My hope was that since when in a state, the next character is being looked at and then the state changes, the code for each state would be small and simple.  That should make the code for each state easy to change if the spec of the data being decoded changes.  I made each major state a method.

The decoder can decode multiple encodings (with different code pages) in one string.  That requires recursive calling of methods.  That could be a problem because the methods don't return in a normal way.  That could lead to the stack going very deep.  I throw an event and catch it to end things.

The encoded string for an given encoding can't contain another encoding.  I'm not sure if this is allowed.  If it is, I will have to think about how to do that.

Since I still can't get this stupid google groups to upload a file, you can download a file out of the class here: 


All comments are welcome.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/cf35139f-ea3e-401a-a5cb-c97c07d5a75en%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Louis LaBrunda
After sleeping on my machine state approach to decoding RFC2047 data I decided that I learned a lot but that simple loops could accomplish the conversion without throwing/catching events and the depth wasn't going to be a problem.  So I have rewritten the code in that mode.  I will post the code later.

New question.  This is a string I got as the subject of an email:

'=?UTF-8?Q?should?= =?UTF-8?Q?_=F0=9D=91=BA=F0=9D=92=95=F0=9D=92=82=F0=9D=92=84=F0=9D=92=86=F0=9D=92=9A?= =?UTF-8?Q?_=F0=9D=91=A8=F0=9D=92=83=F0=9D=92=93=F0=9D=92=82=F0=9D=92=8E=F0=9D=92=94?= =?UTF-8?Q?_run_for_Governor=3F?='

There are four encoded sections all claiming to to be code page UTF-8.  After conversion I get:

'should  ????????????  ????????????  run for Governor?'

I converted the two middle sections to this: ' 𠑺𠒕𠒂𠒄𠒆𠒚' and ' 𠑨𠒃𠒓𠒂𠒎𠒔'

I'm sure that is correct.  The current code page seems to be 819.  I assume that is correct?

Anyone have any idea what's up?  Is UTF-8 wrong?  Is 819 wrong?

Based on the body of the email I think the two strings should convert to "Stacey Abrams", maybe with some funny formatting like italics.

Lou


On Friday, April 2, 2021 at 4:40:28 PM UTC-4 Louis LaBrunda wrote:
Hi Seth,  Joachim and everyone,

Just for fun I wrote a RFC2047 decoder.  I had an idea for a way of converting this stuff that I decided to play with.  Years ago, when I did conversions like this (I don't remember what was being converted, that's how long ago it was) I used a thing called "state diagrams".  State Diagrams are a way of drawing the steps (states) a machine goes through to perform a process based upon inputs.  They are kinda like a flow chart.  I think they were used to design things like vending machines before integrated circuits were cheep and readily available.  I never used them for machines myself but borrowed the idea for conversions.  They are drawn with circles that represent the current state and lines that show input and what state the machine goes to next.  States can loop back to themselves.

My hope was that since when in a state, the next character is being looked at and then the state changes, the code for each state would be small and simple.  That should make the code for each state easy to change if the spec of the data being decoded changes.  I made each major state a method.

The decoder can decode multiple encodings (with different code pages) in one string.  That requires recursive calling of methods.  That could be a problem because the methods don't return in a normal way.  That could lead to the stack going very deep.  I throw an event and catch it to end things.

The encoded string for an given encoding can't contain another encoding.  I'm not sure if this is allowed.  If it is, I will have to think about how to do that.

Since I still can't get this stupid google groups to upload a file, you can download a file out of the class here: 


All comments are welcome.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/e8447068-583b-4f23-8978-558cc36c9271n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Louis LaBrunda
Hi,

Here is another subject that doesn't convert:

'=?UTF-8?Q?=F0=9D=90=8C=F0=9D=90=9A=F0=9D=90=AB=F0=9D=90=AD=F0=9D=90=A2=F0=9D=90=A7?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=AE=F0=9D=90=AD=F0=9D=90=A1=F0=9D=90=9E=F0=9D=90=AB?= =?UTF-8?Q?_=F0=9D=90=8A=F0=9D=90=A2=F0=9D=90=A7=F0=9D=90=A0?= =?UTF-8?Q?_=F0=9D=90=89=F0=9D=90=AB.?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=9E=F0=9D=90=A0=F0=9D=90=9A=F0=9D=90=9C=F0=9D=90=B2?= =?UTF-8?Q?_=F0=9D=90=92=F0=9D=90=AE=F0=9D=90=AB=F0=9D=90=AF=F0=9D=90=9E=F0=9D=90=B2?='

Could the Q encoded values be wrong?  They are all above 128 in value.  I have tried all the char sets beside UTF-8 and that doesn't help.  Any ideas?

Lou

P.S. You can download my latest KscRFC2047Decoder.st.


On Saturday, April 3, 2021 at 2:20:48 PM UTC-4 Louis LaBrunda wrote:
After sleeping on my machine state approach to decoding RFC2047 data I decided that I learned a lot but that simple loops could accomplish the conversion without throwing/catching events and the depth wasn't going to be a problem.  So I have rewritten the code in that mode.  I will post the code later.

New question.  This is a string I got as the subject of an email:

'=?UTF-8?Q?should?= =?UTF-8?Q?_=F0=9D=91=BA=F0=9D=92=95=F0=9D=92=82=F0=9D=92=84=F0=9D=92=86=F0=9D=92=9A?= =?UTF-8?Q?_=F0=9D=91=A8=F0=9D=92=83=F0=9D=92=93=F0=9D=92=82=F0=9D=92=8E=F0=9D=92=94?= =?UTF-8?Q?_run_for_Governor=3F?='

There are four encoded sections all claiming to to be code page UTF-8.  After conversion I get:

'should  ????????????  ????????????  run for Governor?'

I converted the two middle sections to this: ' 𠑺𠒕𠒂𠒄𠒆𠒚' and ' 𠑨𠒃𠒓𠒂𠒎𠒔'

I'm sure that is correct.  The current code page seems to be 819.  I assume that is correct?

Anyone have any idea what's up?  Is UTF-8 wrong?  Is 819 wrong?

Based on the body of the email I think the two strings should convert to "Stacey Abrams", maybe with some funny formatting like italics.

Lou


On Friday, April 2, 2021 at 4:40:28 PM UTC-4 Louis LaBrunda wrote:
Hi Seth,  Joachim and everyone,

Just for fun I wrote a RFC2047 decoder.  I had an idea for a way of converting this stuff that I decided to play with.  Years ago, when I did conversions like this (I don't remember what was being converted, that's how long ago it was) I used a thing called "state diagrams".  State Diagrams are a way of drawing the steps (states) a machine goes through to perform a process based upon inputs.  They are kinda like a flow chart.  I think they were used to design things like vending machines before integrated circuits were cheep and readily available.  I never used them for machines myself but borrowed the idea for conversions.  They are drawn with circles that represent the current state and lines that show input and what state the machine goes to next.  States can loop back to themselves.

My hope was that since when in a state, the next character is being looked at and then the state changes, the code for each state would be small and simple.  That should make the code for each state easy to change if the spec of the data being decoded changes.  I made each major state a method.

The decoder can decode multiple encodings (with different code pages) in one string.  That requires recursive calling of methods.  That could be a problem because the methods don't return in a normal way.  That could lead to the stack going very deep.  I throw an event and catch it to end things.

The encoded string for an given encoding can't contain another encoding.  I'm not sure if this is allowed.  If it is, I will have to think about how to do that.

Since I still can't get this stupid google groups to upload a file, you can download a file out of the class here: 


All comments are welcome.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/35600606-be27-4435-80f2-8e394ae2132dn%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Seth Berman
Hi Lou,

Your latest subject are are UTF-8 bold characters.  That's why they all are prefixed with =F0=9D.
It says: 𝐌𝐚𝐫𝐭𝐢𝐧 𝐋𝐮𝐭𝐡𝐞𝐫 𝐊𝐢𝐧𝐠 𝐉𝐫. 𝐋𝐞𝐠𝐚𝐜𝐲 𝐒𝐮𝐫𝐯𝐞𝐲
For example, you can see the 𝐌 is f0 9d 90 8c  (MA­THE­MA­TI­CAL BOLD CA­PI­TAL M)
I found this site which may help you to decode so you know what you are aiming for: https://dogmamix.com/MimeHeadersDecoder/ 

-Seth

On Monday, April 5, 2021 at 10:04:00 AM UTC-4 [hidden email] wrote:
Hi,

Here is another subject that doesn't convert:

'=?UTF-8?Q?=F0=9D=90=8C=F0=9D=90=9A=F0=9D=90=AB=F0=9D=90=AD=F0=9D=90=A2=F0=9D=90=A7?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=AE=F0=9D=90=AD=F0=9D=90=A1=F0=9D=90=9E=F0=9D=90=AB?= =?UTF-8?Q?_=F0=9D=90=8A=F0=9D=90=A2=F0=9D=90=A7=F0=9D=90=A0?= =?UTF-8?Q?_=F0=9D=90=89=F0=9D=90=AB.?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=9E=F0=9D=90=A0=F0=9D=90=9A=F0=9D=90=9C=F0=9D=90=B2?= =?UTF-8?Q?_=F0=9D=90=92=F0=9D=90=AE=F0=9D=90=AB=F0=9D=90=AF=F0=9D=90=9E=F0=9D=90=B2?='

Could the Q encoded values be wrong?  They are all above 128 in value.  I have tried all the char sets beside UTF-8 and that doesn't help.  Any ideas?

Lou

P.S. You can download my latest KscRFC2047Decoder.st.


On Saturday, April 3, 2021 at 2:20:48 PM UTC-4 Louis LaBrunda wrote:
After sleeping on my machine state approach to decoding RFC2047 data I decided that I learned a lot but that simple loops could accomplish the conversion without throwing/catching events and the depth wasn't going to be a problem.  So I have rewritten the code in that mode.  I will post the code later.

New question.  This is a string I got as the subject of an email:

'=?UTF-8?Q?should?= =?UTF-8?Q?_=F0=9D=91=BA=F0=9D=92=95=F0=9D=92=82=F0=9D=92=84=F0=9D=92=86=F0=9D=92=9A?= =?UTF-8?Q?_=F0=9D=91=A8=F0=9D=92=83=F0=9D=92=93=F0=9D=92=82=F0=9D=92=8E=F0=9D=92=94?= =?UTF-8?Q?_run_for_Governor=3F?='

There are four encoded sections all claiming to to be code page UTF-8.  After conversion I get:

'should  ????????????  ????????????  run for Governor?'

I converted the two middle sections to this: ' 𠑺𠒕𠒂𠒄𠒆𠒚' and ' 𠑨𠒃𠒓𠒂𠒎𠒔'

I'm sure that is correct.  The current code page seems to be 819.  I assume that is correct?

Anyone have any idea what's up?  Is UTF-8 wrong?  Is 819 wrong?

Based on the body of the email I think the two strings should convert to "Stacey Abrams", maybe with some funny formatting like italics.

Lou


On Friday, April 2, 2021 at 4:40:28 PM UTC-4 Louis LaBrunda wrote:
Hi Seth,  Joachim and everyone,

Just for fun I wrote a RFC2047 decoder.  I had an idea for a way of converting this stuff that I decided to play with.  Years ago, when I did conversions like this (I don't remember what was being converted, that's how long ago it was) I used a thing called "state diagrams".  State Diagrams are a way of drawing the steps (states) a machine goes through to perform a process based upon inputs.  They are kinda like a flow chart.  I think they were used to design things like vending machines before integrated circuits were cheep and readily available.  I never used them for machines myself but borrowed the idea for conversions.  They are drawn with circles that represent the current state and lines that show input and what state the machine goes to next.  States can loop back to themselves.

My hope was that since when in a state, the next character is being looked at and then the state changes, the code for each state would be small and simple.  That should make the code for each state easy to change if the spec of the data being decoded changes.  I made each major state a method.

The decoder can decode multiple encodings (with different code pages) in one string.  That requires recursive calling of methods.  That could be a problem because the methods don't return in a normal way.  That could lead to the stack going very deep.  I throw an event and catch it to end things.

The encoded string for an given encoding can't contain another encoding.  I'm not sure if this is allowed.  If it is, I will have to think about how to do that.

Since I still can't get this stupid google groups to upload a file, you can download a file out of the class here: 


All comments are welcome.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/8c2d381e-e0b4-4d34-a00e-7beedca53465n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Seth Berman
In reply to this post by Louis LaBrunda
Hi Lou,

Your latest subject contains UTF-8 bold characters.  That's why they all are prefixed with =F0=9D.
It says: 𝐌𝐚𝐫𝐭𝐢𝐧 𝐋𝐮𝐭𝐡𝐞𝐫 𝐊𝐢𝐧𝐠 𝐉𝐫. 𝐋𝐞𝐠𝐚𝐜𝐲 𝐒𝐮𝐫𝐯𝐞𝐲
For example, you can see the 𝐌 is f0 9d 90 8c  (MA­THE­MA­TI­CAL BOLD CA­PI­TAL M)
I found this site which may help you to decode so you know what you are aiming for: https://dogmamix.com/MimeHeadersDecoder/ 

-Seth

On Monday, April 5, 2021 at 10:04:00 AM UTC-4 [hidden email] wrote:
Hi,

Here is another subject that doesn't convert:

'=?UTF-8?Q?=F0=9D=90=8C=F0=9D=90=9A=F0=9D=90=AB=F0=9D=90=AD=F0=9D=90=A2=F0=9D=90=A7?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=AE=F0=9D=90=AD=F0=9D=90=A1=F0=9D=90=9E=F0=9D=90=AB?= =?UTF-8?Q?_=F0=9D=90=8A=F0=9D=90=A2=F0=9D=90=A7=F0=9D=90=A0?= =?UTF-8?Q?_=F0=9D=90=89=F0=9D=90=AB.?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=9E=F0=9D=90=A0=F0=9D=90=9A=F0=9D=90=9C=F0=9D=90=B2?= =?UTF-8?Q?_=F0=9D=90=92=F0=9D=90=AE=F0=9D=90=AB=F0=9D=90=AF=F0=9D=90=9E=F0=9D=90=B2?='

Could the Q encoded values be wrong?  They are all above 128 in value.  I have tried all the char sets beside UTF-8 and that doesn't help.  Any ideas?

Lou

P.S. You can download my latest KscRFC2047Decoder.st.


On Saturday, April 3, 2021 at 2:20:48 PM UTC-4 Louis LaBrunda wrote:
After sleeping on my machine state approach to decoding RFC2047 data I decided that I learned a lot but that simple loops could accomplish the conversion without throwing/catching events and the depth wasn't going to be a problem.  So I have rewritten the code in that mode.  I will post the code later.

New question.  This is a string I got as the subject of an email:

'=?UTF-8?Q?should?= =?UTF-8?Q?_=F0=9D=91=BA=F0=9D=92=95=F0=9D=92=82=F0=9D=92=84=F0=9D=92=86=F0=9D=92=9A?= =?UTF-8?Q?_=F0=9D=91=A8=F0=9D=92=83=F0=9D=92=93=F0=9D=92=82=F0=9D=92=8E=F0=9D=92=94?= =?UTF-8?Q?_run_for_Governor=3F?='

There are four encoded sections all claiming to to be code page UTF-8.  After conversion I get:

'should  ????????????  ????????????  run for Governor?'

I converted the two middle sections to this: ' 𠑺𠒕𠒂𠒄𠒆𠒚' and ' 𠑨𠒃𠒓𠒂𠒎𠒔'

I'm sure that is correct.  The current code page seems to be 819.  I assume that is correct?

Anyone have any idea what's up?  Is UTF-8 wrong?  Is 819 wrong?

Based on the body of the email I think the two strings should convert to "Stacey Abrams", maybe with some funny formatting like italics.

Lou


On Friday, April 2, 2021 at 4:40:28 PM UTC-4 Louis LaBrunda wrote:
Hi Seth,  Joachim and everyone,

Just for fun I wrote a RFC2047 decoder.  I had an idea for a way of converting this stuff that I decided to play with.  Years ago, when I did conversions like this (I don't remember what was being converted, that's how long ago it was) I used a thing called "state diagrams".  State Diagrams are a way of drawing the steps (states) a machine goes through to perform a process based upon inputs.  They are kinda like a flow chart.  I think they were used to design things like vending machines before integrated circuits were cheep and readily available.  I never used them for machines myself but borrowed the idea for conversions.  They are drawn with circles that represent the current state and lines that show input and what state the machine goes to next.  States can loop back to themselves.

My hope was that since when in a state, the next character is being looked at and then the state changes, the code for each state would be small and simple.  That should make the code for each state easy to change if the spec of the data being decoded changes.  I made each major state a method.

The decoder can decode multiple encodings (with different code pages) in one string.  That requires recursive calling of methods.  That could be a problem because the methods don't return in a normal way.  That could lead to the stack going very deep.  I throw an event and catch it to end things.

The encoded string for an given encoding can't contain another encoding.  I'm not sure if this is allowed.  If it is, I will have to think about how to do that.

Since I still can't get this stupid google groups to upload a file, you can download a file out of the class here: 


All comments are welcome.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/31e0f6d5-3561-4d46-98be-89e5ddaf78e2n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Louis LaBrunda
Thanks Seth, I'm sure that will be a big help.

Lou

On Monday, April 5, 2021 at 11:08:20 AM UTC-4 Seth Berman wrote:
Hi Lou,

Your latest subject contains UTF-8 bold characters.  That's why they all are prefixed with =F0=9D.
It says: 𝐌𝐚𝐫𝐭𝐢𝐧 𝐋𝐮𝐭𝐡𝐞𝐫 𝐊𝐢𝐧𝐠 𝐉𝐫. 𝐋𝐞𝐠𝐚𝐜𝐲 𝐒𝐮𝐫𝐯𝐞𝐲
For example, you can see the 𝐌 is f0 9d 90 8c  (MA­THE­MA­TI­CAL BOLD CA­PI­TAL M)
I found this site which may help you to decode so you know what you are aiming for: https://dogmamix.com/MimeHeadersDecoder/ 

-Seth

On Monday, April 5, 2021 at 10:04:00 AM UTC-4 [hidden email] wrote:
Hi,

Here is another subject that doesn't convert:

'=?UTF-8?Q?=F0=9D=90=8C=F0=9D=90=9A=F0=9D=90=AB=F0=9D=90=AD=F0=9D=90=A2=F0=9D=90=A7?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=AE=F0=9D=90=AD=F0=9D=90=A1=F0=9D=90=9E=F0=9D=90=AB?= =?UTF-8?Q?_=F0=9D=90=8A=F0=9D=90=A2=F0=9D=90=A7=F0=9D=90=A0?= =?UTF-8?Q?_=F0=9D=90=89=F0=9D=90=AB.?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=9E=F0=9D=90=A0=F0=9D=90=9A=F0=9D=90=9C=F0=9D=90=B2?= =?UTF-8?Q?_=F0=9D=90=92=F0=9D=90=AE=F0=9D=90=AB=F0=9D=90=AF=F0=9D=90=9E=F0=9D=90=B2?='

Could the Q encoded values be wrong?  They are all above 128 in value.  I have tried all the char sets beside UTF-8 and that doesn't help.  Any ideas?

Lou

P.S. You can download my latest KscRFC2047Decoder.st.


On Saturday, April 3, 2021 at 2:20:48 PM UTC-4 Louis LaBrunda wrote:
After sleeping on my machine state approach to decoding RFC2047 data I decided that I learned a lot but that simple loops could accomplish the conversion without throwing/catching events and the depth wasn't going to be a problem.  So I have rewritten the code in that mode.  I will post the code later.

New question.  This is a string I got as the subject of an email:

'=?UTF-8?Q?should?= =?UTF-8?Q?_=F0=9D=91=BA=F0=9D=92=95=F0=9D=92=82=F0=9D=92=84=F0=9D=92=86=F0=9D=92=9A?= =?UTF-8?Q?_=F0=9D=91=A8=F0=9D=92=83=F0=9D=92=93=F0=9D=92=82=F0=9D=92=8E=F0=9D=92=94?= =?UTF-8?Q?_run_for_Governor=3F?='

There are four encoded sections all claiming to to be code page UTF-8.  After conversion I get:

'should  ????????????  ????????????  run for Governor?'

I converted the two middle sections to this: ' 𠑺𠒕𠒂𠒄𠒆𠒚' and ' 𠑨𠒃𠒓𠒂𠒎𠒔'

I'm sure that is correct.  The current code page seems to be 819.  I assume that is correct?

Anyone have any idea what's up?  Is UTF-8 wrong?  Is 819 wrong?

Based on the body of the email I think the two strings should convert to "Stacey Abrams", maybe with some funny formatting like italics.

Lou


On Friday, April 2, 2021 at 4:40:28 PM UTC-4 Louis LaBrunda wrote:
Hi Seth,  Joachim and everyone,

Just for fun I wrote a RFC2047 decoder.  I had an idea for a way of converting this stuff that I decided to play with.  Years ago, when I did conversions like this (I don't remember what was being converted, that's how long ago it was) I used a thing called "state diagrams".  State Diagrams are a way of drawing the steps (states) a machine goes through to perform a process based upon inputs.  They are kinda like a flow chart.  I think they were used to design things like vending machines before integrated circuits were cheep and readily available.  I never used them for machines myself but borrowed the idea for conversions.  They are drawn with circles that represent the current state and lines that show input and what state the machine goes to next.  States can loop back to themselves.

My hope was that since when in a state, the next character is being looked at and then the state changes, the code for each state would be small and simple.  That should make the code for each state easy to change if the spec of the data being decoded changes.  I made each major state a method.

The decoder can decode multiple encodings (with different code pages) in one string.  That requires recursive calling of methods.  That could be a problem because the methods don't return in a normal way.  That could lead to the stack going very deep.  I throw an event and catch it to end things.

The encoded string for an given encoding can't contain another encoding.  I'm not sure if this is allowed.  If it is, I will have to think about how to do that.

Since I still can't get this stupid google groups to upload a file, you can download a file out of the class here: 


All comments are welcome.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/bf5f9b99-0bcb-4ffc-9326-de3e6c9f11ebn%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Louis LaBrunda
Seth, have I gotten into an area that VA Smalltalk doesn't support yet?  Is this a Unicode code page problem?  Might I hack a solution by subtracting out the "bold" part to bring things down to a regular letter?

Lou

On Monday, April 5, 2021 at 11:35:59 AM UTC-4 Louis LaBrunda wrote:
Thanks Seth, I'm sure that will be a big help.

Lou

On Monday, April 5, 2021 at 11:08:20 AM UTC-4 Seth Berman wrote:
Hi Lou,

Your latest subject contains UTF-8 bold characters.  That's why they all are prefixed with =F0=9D.
It says: 𝐌𝐚𝐫𝐭𝐢𝐧 𝐋𝐮𝐭𝐡𝐞𝐫 𝐊𝐢𝐧𝐠 𝐉𝐫. 𝐋𝐞𝐠𝐚𝐜𝐲 𝐒𝐮𝐫𝐯𝐞𝐲
For example, you can see the 𝐌 is f0 9d 90 8c  (MA­THE­MA­TI­CAL BOLD CA­PI­TAL M)
I found this site which may help you to decode so you know what you are aiming for: https://dogmamix.com/MimeHeadersDecoder/ 

-Seth

On Monday, April 5, 2021 at 10:04:00 AM UTC-4 [hidden email] wrote:
Hi,

Here is another subject that doesn't convert:

'=?UTF-8?Q?=F0=9D=90=8C=F0=9D=90=9A=F0=9D=90=AB=F0=9D=90=AD=F0=9D=90=A2=F0=9D=90=A7?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=AE=F0=9D=90=AD=F0=9D=90=A1=F0=9D=90=9E=F0=9D=90=AB?= =?UTF-8?Q?_=F0=9D=90=8A=F0=9D=90=A2=F0=9D=90=A7=F0=9D=90=A0?= =?UTF-8?Q?_=F0=9D=90=89=F0=9D=90=AB.?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=9E=F0=9D=90=A0=F0=9D=90=9A=F0=9D=90=9C=F0=9D=90=B2?= =?UTF-8?Q?_=F0=9D=90=92=F0=9D=90=AE=F0=9D=90=AB=F0=9D=90=AF=F0=9D=90=9E=F0=9D=90=B2?='

Could the Q encoded values be wrong?  They are all above 128 in value.  I have tried all the char sets beside UTF-8 and that doesn't help.  Any ideas?

Lou

P.S. You can download my latest KscRFC2047Decoder.st.


On Saturday, April 3, 2021 at 2:20:48 PM UTC-4 Louis LaBrunda wrote:
After sleeping on my machine state approach to decoding RFC2047 data I decided that I learned a lot but that simple loops could accomplish the conversion without throwing/catching events and the depth wasn't going to be a problem.  So I have rewritten the code in that mode.  I will post the code later.

New question.  This is a string I got as the subject of an email:

'=?UTF-8?Q?should?= =?UTF-8?Q?_=F0=9D=91=BA=F0=9D=92=95=F0=9D=92=82=F0=9D=92=84=F0=9D=92=86=F0=9D=92=9A?= =?UTF-8?Q?_=F0=9D=91=A8=F0=9D=92=83=F0=9D=92=93=F0=9D=92=82=F0=9D=92=8E=F0=9D=92=94?= =?UTF-8?Q?_run_for_Governor=3F?='

There are four encoded sections all claiming to to be code page UTF-8.  After conversion I get:

'should  ????????????  ????????????  run for Governor?'

I converted the two middle sections to this: ' 𠑺𠒕𠒂𠒄𠒆𠒚' and ' 𠑨𠒃𠒓𠒂𠒎𠒔'

I'm sure that is correct.  The current code page seems to be 819.  I assume that is correct?

Anyone have any idea what's up?  Is UTF-8 wrong?  Is 819 wrong?

Based on the body of the email I think the two strings should convert to "Stacey Abrams", maybe with some funny formatting like italics.

Lou


On Friday, April 2, 2021 at 4:40:28 PM UTC-4 Louis LaBrunda wrote:
Hi Seth,  Joachim and everyone,

Just for fun I wrote a RFC2047 decoder.  I had an idea for a way of converting this stuff that I decided to play with.  Years ago, when I did conversions like this (I don't remember what was being converted, that's how long ago it was) I used a thing called "state diagrams".  State Diagrams are a way of drawing the steps (states) a machine goes through to perform a process based upon inputs.  They are kinda like a flow chart.  I think they were used to design things like vending machines before integrated circuits were cheep and readily available.  I never used them for machines myself but borrowed the idea for conversions.  They are drawn with circles that represent the current state and lines that show input and what state the machine goes to next.  States can loop back to themselves.

My hope was that since when in a state, the next character is being looked at and then the state changes, the code for each state would be small and simple.  That should make the code for each state easy to change if the spec of the data being decoded changes.  I made each major state a method.

The decoder can decode multiple encodings (with different code pages) in one string.  That requires recursive calling of methods.  That could be a problem because the methods don't return in a normal way.  That could lead to the stack going very deep.  I throw an event and catch it to end things.

The encoded string for an given encoding can't contain another encoding.  I'm not sure if this is allowed.  If it is, I will have to think about how to do that.

Since I still can't get this stupid google groups to upload a file, you can download a file out of the class here: 


All comments are welcome.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/e96fa1a0-386f-4a0e-bc3a-8990c6af259en%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Seth Berman
Hi Lou,

"have I gotten into an area that VA Smalltalk doesn't support yet?"
Not necessarily, it all depends on what you want to do with it.
If you want to show it in an editor, save it in source code, or otherwise interpret it...then probably so.
If you want to simply pass it through to somewhere else...then not at all.

I think your issue is that you need to parse those bytes as UTF-8.  So you need a UTF-8 parser (I would think).
Those hex values sit in a Smalltalk 'String' container but that isn't really relevant.  So you don't need to do code page conversion.

Some ideas are:
1. Parse the UTF-8 hex values into a ByteArray.  (Maybe as simple as removing the '=' stuff in-between and pushing those hex converted integer values to a ByteArray).
2. Assuming you want to turn it into a Smalltalk String and accept some losslessness, then do code page conversion on that ByteArray from UTF-8 -> current code page.
3. If you want to just pass it through to somewhere else, then just leave it as a ByteArray and pass that ByteArray on to something else.

- Seth

On Monday, April 5, 2021 at 11:55:48 AM UTC-4 [hidden email] wrote:
Seth, have I gotten into an area that VA Smalltalk doesn't support yet?  Is this a Unicode code page problem?  Might I hack a solution by subtracting out the "bold" part to bring things down to a regular letter?

Lou

On Monday, April 5, 2021 at 11:35:59 AM UTC-4 Louis LaBrunda wrote:
Thanks Seth, I'm sure that will be a big help.

Lou

On Monday, April 5, 2021 at 11:08:20 AM UTC-4 Seth Berman wrote:
Hi Lou,

Your latest subject contains UTF-8 bold characters.  That's why they all are prefixed with =F0=9D.
It says: 𝐌𝐚𝐫𝐭𝐢𝐧 𝐋𝐮𝐭𝐡𝐞𝐫 𝐊𝐢𝐧𝐠 𝐉𝐫. 𝐋𝐞𝐠𝐚𝐜𝐲 𝐒𝐮𝐫𝐯𝐞𝐲
For example, you can see the 𝐌 is f0 9d 90 8c  (MA­THE­MA­TI­CAL BOLD CA­PI­TAL M)
I found this site which may help you to decode so you know what you are aiming for: https://dogmamix.com/MimeHeadersDecoder/ 

-Seth

On Monday, April 5, 2021 at 10:04:00 AM UTC-4 [hidden email] wrote:
Hi,

Here is another subject that doesn't convert:

'=?UTF-8?Q?=F0=9D=90=8C=F0=9D=90=9A=F0=9D=90=AB=F0=9D=90=AD=F0=9D=90=A2=F0=9D=90=A7?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=AE=F0=9D=90=AD=F0=9D=90=A1=F0=9D=90=9E=F0=9D=90=AB?= =?UTF-8?Q?_=F0=9D=90=8A=F0=9D=90=A2=F0=9D=90=A7=F0=9D=90=A0?= =?UTF-8?Q?_=F0=9D=90=89=F0=9D=90=AB.?= =?UTF-8?Q?_=F0=9D=90=8B=F0=9D=90=9E=F0=9D=90=A0=F0=9D=90=9A=F0=9D=90=9C=F0=9D=90=B2?= =?UTF-8?Q?_=F0=9D=90=92=F0=9D=90=AE=F0=9D=90=AB=F0=9D=90=AF=F0=9D=90=9E=F0=9D=90=B2?='

Could the Q encoded values be wrong?  They are all above 128 in value.  I have tried all the char sets beside UTF-8 and that doesn't help.  Any ideas?

Lou

P.S. You can download my latest KscRFC2047Decoder.st.


On Saturday, April 3, 2021 at 2:20:48 PM UTC-4 Louis LaBrunda wrote:
After sleeping on my machine state approach to decoding RFC2047 data I decided that I learned a lot but that simple loops could accomplish the conversion without throwing/catching events and the depth wasn't going to be a problem.  So I have rewritten the code in that mode.  I will post the code later.

New question.  This is a string I got as the subject of an email:

'=?UTF-8?Q?should?= =?UTF-8?Q?_=F0=9D=91=BA=F0=9D=92=95=F0=9D=92=82=F0=9D=92=84=F0=9D=92=86=F0=9D=92=9A?= =?UTF-8?Q?_=F0=9D=91=A8=F0=9D=92=83=F0=9D=92=93=F0=9D=92=82=F0=9D=92=8E=F0=9D=92=94?= =?UTF-8?Q?_run_for_Governor=3F?='

There are four encoded sections all claiming to to be code page UTF-8.  After conversion I get:

'should  ????????????  ????????????  run for Governor?'

I converted the two middle sections to this: ' 𠑺𠒕𠒂𠒄𠒆𠒚' and ' 𠑨𠒃𠒓𠒂𠒎𠒔'

I'm sure that is correct.  The current code page seems to be 819.  I assume that is correct?

Anyone have any idea what's up?  Is UTF-8 wrong?  Is 819 wrong?

Based on the body of the email I think the two strings should convert to "Stacey Abrams", maybe with some funny formatting like italics.

Lou


On Friday, April 2, 2021 at 4:40:28 PM UTC-4 Louis LaBrunda wrote:
Hi Seth,  Joachim and everyone,

Just for fun I wrote a RFC2047 decoder.  I had an idea for a way of converting this stuff that I decided to play with.  Years ago, when I did conversions like this (I don't remember what was being converted, that's how long ago it was) I used a thing called "state diagrams".  State Diagrams are a way of drawing the steps (states) a machine goes through to perform a process based upon inputs.  They are kinda like a flow chart.  I think they were used to design things like vending machines before integrated circuits were cheep and readily available.  I never used them for machines myself but borrowed the idea for conversions.  They are drawn with circles that represent the current state and lines that show input and what state the machine goes to next.  States can loop back to themselves.

My hope was that since when in a state, the next character is being looked at and then the state changes, the code for each state would be small and simple.  That should make the code for each state easy to change if the spec of the data being decoded changes.  I made each major state a method.

The decoder can decode multiple encodings (with different code pages) in one string.  That requires recursive calling of methods.  That could be a problem because the methods don't return in a normal way.  That could lead to the stack going very deep.  I throw an event and catch it to end things.

The encoded string for an given encoding can't contain another encoding.  I'm not sure if this is allowed.  If it is, I will have to think about how to do that.

Since I still can't get this stupid google groups to upload a file, you can download a file out of the class here: 


All comments are welcome.

Lou

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/30f18924-1a14-45ac-87a8-ca8ed88e762an%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Louis LaBrunda
Hi Seth,

Thanks.

On Monday, April 5, 2021 at 12:31:01 PM UTC-4 Seth Berman wrote:

Some ideas are:
1. Parse the UTF-8 hex values into a ByteArray.  (Maybe as simple as removing the '=' stuff in-between and pushing those hex converted integer values to a ByteArray).
2. Assuming you want to turn it into a Smalltalk String and accept some losslessness, then do code page conversion on that ByteArray from UTF-8 -> current code page.

This is what I'm doing (I think).  I put the converted hex into a String and not a ByteArray and use convertFromCodePage: 'UTF-8', to convert it to the current code page.  The result is mostly "?" marks.

The current code page is 819.

Lou
 

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/d1706027-e73a-47d9-b368-c5ed2e936583n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Seth Berman
Hi Lou,

I guess this goes back to what you ultimately want to do with the UTF-8 bytes?
In other words...why must you convert to the current code page? (which btw, I don't think you can do since 819 doesn't have an equivalent for those special letters).
If what you have is a valid set of UTF-8 bytes sitting in a ByteArray...what would you like to do next?

- Seth

On Monday, April 5, 2021 at 1:23:27 PM UTC-4 [hidden email] wrote:
Hi Seth,

Thanks.

On Monday, April 5, 2021 at 12:31:01 PM UTC-4 Seth Berman wrote:

Some ideas are:
1. Parse the UTF-8 hex values into a ByteArray.  (Maybe as simple as removing the '=' stuff in-between and pushing those hex converted integer values to a ByteArray).
2. Assuming you want to turn it into a Smalltalk String and accept some losslessness, then do code page conversion on that ByteArray from UTF-8 -> current code page.

This is what I'm doing (I think).  I put the converted hex into a String and not a ByteArray and use convertFromCodePage: 'UTF-8', to convert it to the current code page.  The result is mostly "?" marks.

The current code page is 819.

Lou
 

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/99ad5973-e937-49ac-ba85-26c9d7d54d70n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Louis LaBrunda
Hey Seth,

For my need (I'm not sure about Joachim) I would just like the string to be readable.  So, I don't need the bolding (or whatever), I just need to see what it says.

Because the UTF-8 stuff like "MA­THE­MA­TI­CAL BOLD CA­PI­TAL M" is defined as U+1D40C I have been trying to subtract out the the U part but that doesn't seem to get me where I want to be.

Lou

On Monday, April 5, 2021 at 1:45:06 PM UTC-4 Seth Berman wrote:
Hi Lou,

I guess this goes back to what you ultimately want to do with the UTF-8 bytes?
In other words...why must you convert to the current code page? (which btw, I don't think you can do since 819 doesn't have an equivalent for those special letters).
If what you have is a valid set of UTF-8 bytes sitting in a ByteArray...what would you like to do next?

- Seth

On Monday, April 5, 2021 at 1:23:27 PM UTC-4 [hidden email] wrote:
Hi Seth,

Thanks.

On Monday, April 5, 2021 at 12:31:01 PM UTC-4 Seth Berman wrote:

Some ideas are:
1. Parse the UTF-8 hex values into a ByteArray.  (Maybe as simple as removing the '=' stuff in-between and pushing those hex converted integer values to a ByteArray).
2. Assuming you want to turn it into a Smalltalk String and accept some losslessness, then do code page conversion on that ByteArray from UTF-8 -> current code page.

This is what I'm doing (I think).  I put the converted hex into a String and not a ByteArray and use convertFromCodePage: 'UTF-8', to convert it to the current code page.  The result is mostly "?" marks.

The current code page is 819.

Lou
 

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/1c517251-c3b1-434c-8206-b616f3b3bc69n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Seth Berman
Hi Lou,

"I would just like the string to be readable"
- It just depends where you want to read it.
If you dump those bytes to a file and open them in something like Notepad++ in UTF-8 mode...then it will be readable.
If you set the code page of the smalltalk scintilla editor to 65001, then it will be readable (see below)

But those UTF-8 encoded code points are just numbers assigned by a group of people.  It doesn't know that its technically a decorated ASCII M.  So you would need to have some program that new how to perform that conversion to some code page which is certainly not something that VAST does.


smalltalk_scintilla.png

On Monday, April 5, 2021 at 1:53:40 PM UTC-4 [hidden email] wrote:
Hey Seth,

For my need (I'm not sure about Joachim) I would just like the string to be readable.  So, I don't need the bolding (or whatever), I just need to see what it says.

Because the UTF-8 stuff like "MA­THE­MA­TI­CAL BOLD CA­PI­TAL M" is defined as U+1D40C I have been trying to subtract out the the U part but that doesn't seem to get me where I want to be.

Lou

On Monday, April 5, 2021 at 1:45:06 PM UTC-4 Seth Berman wrote:
Hi Lou,

I guess this goes back to what you ultimately want to do with the UTF-8 bytes?
In other words...why must you convert to the current code page? (which btw, I don't think you can do since 819 doesn't have an equivalent for those special letters).
If what you have is a valid set of UTF-8 bytes sitting in a ByteArray...what would you like to do next?

- Seth

On Monday, April 5, 2021 at 1:23:27 PM UTC-4 [hidden email] wrote:
Hi Seth,

Thanks.

On Monday, April 5, 2021 at 12:31:01 PM UTC-4 Seth Berman wrote:

Some ideas are:
1. Parse the UTF-8 hex values into a ByteArray.  (Maybe as simple as removing the '=' stuff in-between and pushing those hex converted integer values to a ByteArray).
2. Assuming you want to turn it into a Smalltalk String and accept some losslessness, then do code page conversion on that ByteArray from UTF-8 -> current code page.

This is what I'm doing (I think).  I put the converted hex into a String and not a ByteArray and use convertFromCodePage: 'UTF-8', to convert it to the current code page.  The result is mostly "?" marks.

The current code page is 819.

Lou
 

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/f2cd8635-b850-4bad-bf4f-eb11549cfde1n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Great job with IMAP interface

Louis LaBrunda
Hi Seth,

If you dump those bytes to a file and open them in something like Notepad++ in UTF-8 mode...then it will be readable.

The program I need this for displays the subject as a column in a container is a window.  Joachim might be okay with sending the data to a file.
 
If you set the code page of the smalltalk scintilla editor to 65001, then it will be readable (see below)

How did you do that?  Can I call scintilla to give me a converted string?

How about my idea (maybe crazy idea) to subtract something from each chunk (3 or 4 bytes) to bring it down to a range we can work with?
 
But those UTF-8 encoded code points are just numbers assigned by a group of people.  It doesn't know that its technically a decorated ASCII M.  So you would need to have some program that new how to perform that conversion to some code page which is certainly not something that VAST does.

This was my guess, I am just trying to hack my way around it for my limited case until you guys do the real conversion.

Lou 

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/1c9054a1-f40f-4398-8f02-bc3ccab1e661n%40googlegroups.com.
12