ZnMimePart

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

ZnMimePart

alistairgrant
Hi Sven,

I'm trying to parse maildir emails using ZnMimePart and have a few
questions:

1. The standard indicates that text lines should be terminated with
CRLF, however in practice many maildir format emails (as saved by
offlineimap and successfully handled by mutt) use LF.

Are you open to modifying the parser to be a bit more forgiving, and
allow CRLF, LF or CR?

I could then completely deprecate MIMEDocument from the image and make
MailMessage a bit more useful.



2. ZnStringEntity and ZnByteArrayEntity both appear to answer their
contents without decoding the data based on the
Content-Transfer-Encoding, e.g. if I have:

Date: Sat, 16 Mar 2019 12:00:21 +0100
MIME-Version: 1.0
Content-Type: image/jpeg; name="00585-capture.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="00585-capture.jpg"

base64dataincludedhere...

and send:

aZmMimePart contents

I'll get back the the result of evaluating:

'base64dataincludedhere...' asByteArray

instead of the decoded data.

Is this intended?  I'd expect the decoded data (with charset decoding as
well, in the case of text).


3. What do you think of adding a few convenience test methods, e.g.
isImage, isApplication, isText, etc.?


Thanks,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: ZnMimePart

Sven Van Caekenberghe-2
Hi Alistair,

Sorry that it took a while to answer, but I did not forget.

> On 17 Mar 2019, at 07:55, Alistair Grant <[hidden email]> wrote:
>
> Hi Sven,
>
> I'm trying to parse maildir emails using ZnMimePart

Cool.

> and have a few questions:
>
> 1. The standard indicates that text lines should be terminated with
> CRLF, however in practice many maildir format emails (as saved by
> offlineimap and successfully handled by mutt) use LF.
>
> Are you open to modifying the parser to be a bit more forgiving, and
> allow CRLF, LF or CR?
>
> I could then completely deprecate MIMEDocument from the image and make
> MailMessage a bit more useful.

Yes, being more forgiving about line end conventions would not hurt, it is more or less standard practice in Pharo. Just make sure you don't break anything else.

> 2. ZnStringEntity and ZnByteArrayEntity both appear to answer their
> contents without decoding the data based on the
> Content-Transfer-Encoding, e.g. if I have:
>
> Date: Sat, 16 Mar 2019 12:00:21 +0100
> MIME-Version: 1.0
> Content-Type: image/jpeg; name="00585-capture.jpg"
> Content-Transfer-Encoding: base64
> Content-Disposition: inline; filename="00585-capture.jpg"
>
> base64dataincludedhere...
>
> and send:
>
> aZmMimePart contents
>
> I'll get back the the result of evaluating:
>
> 'base64dataincludedhere...' asByteArray
>
> instead of the decoded data.
>
> Is this intended?  I'd expect the decoded data (with charset decoding as
> well, in the case of text).

I am not sure I fully understand: both ZnStringEntity>>#contents and ZnByteArrayEntity>>#contents clearly return the internal string or bytes, so the fully decoded data.

A ZnMimePart considers its entity to be its contents.

The handling of transfer headers is currently located outside these objects, ZnEntityReader and ZnEntityWriter, as far as I remember. This feels correct.

> 3. What do you think of adding a few convenience test methods, e.g.
> isImage, isApplication, isText, etc.?

That would be OK I guess. An alternative would be to add them to ZnMimeType, then more clients could use them (as far as they are universal enough).

> Thanks,
> Alistair

Sven



Reply | Threaded
Open this post in threaded view
|

Re: ZnMimePart

alistairgrant
Hi Sven,


On Mon, 15 Apr 2019 at 16:17, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Hi Alistair,
>
> Sorry that it took a while to answer, but I did not forget.

No problem, I'm a bit behind myself. :-)


> > On 17 Mar 2019, at 07:55, Alistair Grant <[hidden email]> wrote:
> >
> > Hi Sven,
> >
> > I'm trying to parse maildir emails using ZnMimePart
>
> Cool.
>
> > and have a few questions:
> >
> > 1. The standard indicates that text lines should be terminated with
> > CRLF, however in practice many maildir format emails (as saved by
> > offlineimap and successfully handled by mutt) use LF.
> >
> > Are you open to modifying the parser to be a bit more forgiving, and
> > allow CRLF, LF or CR?
> >
> > I could then completely deprecate MIMEDocument from the image and make
> > MailMessage a bit more useful.
>
> Yes, being more forgiving about line end conventions would not hurt, it is more or less standard practice in Pharo. Just make sure you don't break anything else.

I ended up writing a wrapper around the stream that converted random
line endings to CRLF.  After thinking about it a bit further I
actually prefer this approach at the moment.  There's a few places in
the standard where it is quite specific about how to interpret the
various options.  Having the wrapper stream isolates the non-standard
functionality and allows it to be easily removed.


> > 2. ZnStringEntity and ZnByteArrayEntity both appear to answer their
> > contents without decoding the data based on the
> > Content-Transfer-Encoding, e.g. if I have:
> >
> > Date: Sat, 16 Mar 2019 12:00:21 +0100
> > MIME-Version: 1.0
> > Content-Type: image/jpeg; name="00585-capture.jpg"
> > Content-Transfer-Encoding: base64
> > Content-Disposition: inline; filename="00585-capture.jpg"
> >
> > base64dataincludedhere...
> >
> > and send:
> >
> > aZmMimePart contents
> >
> > I'll get back the the result of evaluating:
> >
> > 'base64dataincludedhere...' asByteArray
> >
> > instead of the decoded data.
> >
> > Is this intended?  I'd expect the decoded data (with charset decoding as
> > well, in the case of text).
>
> I am not sure I fully understand: both ZnStringEntity>>#contents and ZnByteArrayEntity>>#contents clearly return the internal string or bytes, so the fully decoded data.
>
> A ZnMimePart considers its entity to be its contents.
>
> The handling of transfer headers is currently located outside these objects, ZnEntityReader and ZnEntityWriter, as far as I remember. This feels correct.

I'll have to get back to this.  I'm side-tracked on other stuff at the
moment, but thanks for your explanation.


> > 3. What do you think of adding a few convenience test methods, e.g.
> > isImage, isApplication, isText, etc.?
>
> That would be OK I guess. An alternative would be to add them to ZnMimeType, then more clients could use them (as far as they are universal enough).

Great.

I'll get around to tidying up my modifications and submit them as a few PRs.

P.S. I've successfully scanned about 35,000 emails.

Thanks again,
Alistair