Email parser

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Email parser

Günther Schmidt
Hi,

is there already a package for email parsing out there?

I'm using the POP and SMTP package from Jose S. Calvo and receiving
mails from a POP Server works fine, as expected, but the mail is just
one string.

Günther


Reply | Threaded
Open this post in threaded view
|

Re: Email parser

Esteban A. Maringolo-2
Günther Schmidt escribió:
> Hi,
>
> is there already a package for email parsing out there?
>
> I'm using the POP and SMTP package from Jose S. Calvo and receiving
> mails from a POP Server works fine, as expected, but the mail is just
> one string.

It could be very easy to split parts (headers, body(ies), and
multipart messages).

Best regards.

--
Esteban A. Maringolo
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Email parser

Günther Schmidt
Hi Esteban,

it might indeed be, but what if somebody has already done the work why
should I do it again?

;-)

Günther

Esteban A. Maringolo schrieb:

> Günther Schmidt escribió:
>
>> Hi,
>>
>> is there already a package for email parsing out there?
>>
>> I'm using the POP and SMTP package from Jose S. Calvo and receiving
>> mails from a POP Server works fine, as expected, but the mail is just
>> one string.
>
>
> It could be very easy to split parts (headers, body(ies), and multipart
> messages).
>
> Best regards.
>


Reply | Threaded
Open this post in threaded view
|

Re: Email parser

Stefan Schmiedl
In reply to this post by Günther Schmidt
On Thu, 10 Feb 2005 18:20:44 +0100,
Günther Schmidt <[hidden email]> wrote:
> Hi,
>
> is there already a package for email parsing out there?

I don't know, but ...

>
> I'm using the POP and SMTP package from Jose S. Calvo and receiving
> mails from a POP Server works fine, as expected, but the mail is just
> one string.

... this is actually nice and simple.

Break the string into lines. Everything up to the first empty line is
the header, the rest is the message body. Each header item has a name
tag, followed by a colon, followed by the value. Header lines beginning
with whitespace are continuation lines.

Have fun :-)

s.


Reply | Threaded
Open this post in threaded view
|

Re: Email parser

Esteban A. Maringolo-2
In reply to this post by Günther Schmidt
Günther Schmidt escribió:
> Hi Esteban,
>
> it might indeed be, but what if somebody has already done the work why
> should I do it again?

Dependending of the complexity, time and money available, allways is
recomendend building your own. Is my personal opinion.

Re-building is not the same that Re-Inventing.

But I went off topic.

--
Esteban A. Maringolo
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Email parser

Ian Bartholomew-19
In reply to this post by Günther Schmidt
Günther,

> is there already a package for email parsing out there?
>
> I'm using the POP and SMTP package from Jose S. Calvo and receiving
> mails from a POP Server works fine, as expected, but the mail is just
> one string.

If you have a look at my NewsArchiveBrowser, specifically
NewsArchiveArticle>>parse, then it might give you some ideas.  It's for
parsing newsgroup messages (and only looks for the Subject/From/Sent
headers) but the email format is similar.

Please note (and it's mentioned a number of times in comments) that the
code in that area is written for _speed_ and shouldn't be taken as any
sort of ST coding example :-)

--
Ian

Use the Reply-To address to contact me.
Mail sent to the From address is ignored.


Reply | Threaded
Open this post in threaded view
|

Re: Email parser

Günther Schmidt
Ian,

thanks.

Ian Bartholomew schrieb:

> Günther,
>
>> is there already a package for email parsing out there?
>>
>> I'm using the POP and SMTP package from Jose S. Calvo and receiving
>> mails from a POP Server works fine, as expected, but the mail is just
>> one string.
>
>
> If you have a look at my NewsArchiveBrowser, specifically
> NewsArchiveArticle>>parse, then it might give you some ideas.  It's for
> parsing newsgroup messages (and only looks for the Subject/From/Sent
> headers) but the email format is similar.
>
> Please note (and it's mentioned a number of times in comments) that the
> code in that area is written for _speed_ and shouldn't be taken as any
> sort of ST coding example :-)

While writing this email I haven't investigated your code yet, but I
already have a question upfront. :-)

If you were to do it in *good* Smalltalk style, how would you write?
I'm asking because I reckon with my little knowledge I already would be
able to write appropriate code to *parse* the email, the code probable
being quite procedural though.
Would you use a FSA?

Günther

>


Reply | Threaded
Open this post in threaded view
|

Re: Email parser

Ian Bartholomew-19
Guenther,

> If you were to do it in *good* Smalltalk style, how would you write?
> I'm asking because I reckon with my little knowledge I already would be
> able to write appropriate code to *parse* the email, the code probable
> being quite procedural though.
> Would you use a FSA?

No, it's quite a simple task and doesn't need anything so heavy.  All I
meant with my comment was that there are places that I haven't followed
  the "normal" way of doing things but have "cheated" a bit to gain a
speed improvement.  Fot example, in the #parse method I have used a
string search for a line delimiter rather than the more "natural" use of
a Stream and #nextLine.

FWIW, A quick play has come up with the following class that parses an
email string into a header table and a text.  The only (slightly)
complex bits are allowing for headers that cover multiple lines and
headers that appear more than once..

NB I've only tested it with one e-mail so no guarentees :-)

To test evaluate

EMailMessage from: aString

where aString is the contents of an email message.

--- cut here ---

"Filed out from Dolphin Smalltalk XP"!

Object subclass: #EMailMessage
        instanceVariableNames: 'headers text lastHeader'
        classVariableNames: ''
        poolDictionaries: ''
        classInstanceVariableNames: ''!
EMailMessage guid: (GUID fromString:
'{AB70CD3A-4F03-4997-894E-299B3BB6B87E}')!
EMailMessage comment: ''!
!EMailMessage categoriesForClass!Kernel-Objects! !
!EMailMessage methodsFor!

from: aString
        | stream line |
        headers := LookupTable new.
        stream := aString readStream.
        [stream atEnd not and: [(line := stream nextLine) notEmpty]]
                whileTrue: [self parseHeaderFrom: line].
        text := stream upToEnd!

parseHeaderFrom: aString
        | header headerValue |
        aString first isLetter
                ifTrue:
                        [header := aString readStream upTo: $:.
                        headerValue := aString copyFrom: header size + 2.
                        [headers includesKey: header] whileTrue: [header := header , 'X'].
                        headers at: header put: headerValue.
                        lastHeader := header]
                ifFalse: [headers at: lastHeader put: (headers at: lastHeader) ,
aString]! !
!EMailMessage categoriesFor: #from:!public! !
!EMailMessage categoriesFor: #parseHeaderFrom:!public! !

!EMailMessage class methodsFor!

from: aString
        ^super new from: aString! !
!EMailMessage class categoriesFor: #from:!public! !

--- cut here ---

--
Ian

Use the Reply-To address to contact me.
Mail sent to the From address is ignored.