Hi!
Is there a way to get the mbox for this mailing list. This file is supposed to contains all the mails that have been sent. I would like to try to do some mail mining… Cheers, Alexandre -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. |
The archive is here: Marcus |
Thanks!
Alexandre > On Jul 6, 2015, at 11:53 AM, Marcus Denker <[hidden email]> wrote: > > >> On 06 Jul 2015, at 11:50, Alexandre Bergel <[hidden email]> wrote: >> >> Hi! >> >> Is there a way to get the mbox for this mailing list. This file is supposed to contains all the mails that have been sent. >> I would like to try to do some mail mining… >> > > The archive is here: > > http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/ > > > Marcus > -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;. |
Administrator
|
In reply to this post by abergel
archiveUrl := 'http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org' asUrl.
archive := Soup fromUrl: archiveUrl asString. monthlyZipLinks := archive findAllTags: [ :t | t name = 'a' and: [ (t attributeAt: 'href') endsWith: 'gz' ] ]. monthlyZipUrls := monthlyZipLinks collect: [ :t | archiveUrl / (t attributeAt: 'href') ]. (monthlyZipUrls first: 2) do: [ :e | ZnClient new url: e; downloadTo: e file ]. The archives are straight text files, in which the individual messages are separated by a seemingly random number of LFs. Since LFs also separate lines within the messages, IDK how you distinguish, except to use PP to look for message headers, which seem to follow a regular format e.g. `From alexandre at bergel.eu Wed May 21 15:13:48 2008`
Cheers,
Sean |
Administrator
|
Oh, I forgot one thing... Unix... why text and files?! whyyyyy?!
Cheers,
Sean |
In reply to this post by Sean P. DeNigris
The archives are straight text files, in which the individual messages are Actually they are valid mbox files. (At least my mutt opened it just fine.) The separator is "From " line, not newlines. |
In reply to this post by Sean P. DeNigris
Oh, I forgot one thing... Not fan of unix I see. :) |
Administrator
|
In reply to this post by Peter Uhnak
Ah! Thanks :) The .txt extension threw me off. Yes, Mac Mail imports them fine as well
Cheers,
Sean |
Administrator
|
In reply to this post by Peter Uhnak
I've been spoiled by living objects. For me, dealing with text and files is like writing in assembler :)
Cheers,
Sean |
In reply to this post by Peter Uhnak
2015-07-06 14:29 GMT+02:00 Peter Uhnák <[hidden email]>:
From followed by a space. Each message ends with an blank line It seems there are multiple, incompatible mbox formats. Thierry |
I've been doing some mailing list analysis recently (in Ruby), and would be very interested in porting it over to Smalltalk. (I was actually getting really frustrated at the lack of proper debugging setup in Ruby, even though it had some great mail-related libraries). I was looking at thread lengths, numbers of unanswered threads, etc. Alexandre -- I haven't been able to find a good Mail parsing library for Smalltalk (preferably one that reads the Mbox format natively), I'd be curious to know what you end up using.http://lists.pharo.org/mailman/private/pharo-dev_lists.pharo.org.mbox/pharo-dev_lists.pharo.org.mbox On Mon, Jul 6, 2015 at 8:41 AM, Thierry Goubier <[hidden email]> wrote:
|
With ZnHeaders and ZnMimePart you should get a long way in parsing mail boxes. I believe some people have already experimented with this, but I am not sure and I forgot.
> On 06 Jul 2015, at 16:11, Dmitri Zagidulin <[hidden email]> wrote: > > I've been doing some mailing list analysis recently (in Ruby), and would be very interested in porting it over to Smalltalk. (I was actually getting really frustrated at the lack of proper debugging setup in Ruby, even though it had some great mail-related libraries). I was looking at thread lengths, numbers of unanswered threads, etc. > > Alexandre -- I haven't been able to find a good Mail parsing library for Smalltalk (preferably one that reads the Mbox format natively), I'd be curious to know what you end up using. > > As for the download URL -- the link Marcus gave is, unfortunately, in Piper-mail's own format (a simplified version of mbox, really). > To get the actual .mbox file, you'd need to use this link: > > http://lists.pharo.org/mailman/private/pharo-dev_lists.pharo.org.mbox/pharo-dev_lists.pharo.org.mbox > > (Note that it requires you to authenticate with your mailing list email and password (that you created when you first signed up for the mailing list)). But once authenticated, you can download it with Zinc (or wget) or whatever, and start processing it. > > Let us know how it goes! > > > On Mon, Jul 6, 2015 at 8:41 AM, Thierry Goubier <[hidden email]> wrote: > > > 2015-07-06 14:29 GMT+02:00 Peter Uhnák <[hidden email]>: > The archives are straight text files, in which the individual messages are > separated by a seemingly random number of LFs. > > Actually they are valid mbox files. (At least my mutt opened it just fine.) > The separator is "From " line, not newlines. > > From followed by a space. Each message ends with an blank line > > https://en.wikipedia.org/wiki/Mbox, https://tools.ietf.org/html/rfc4155 > > It seems there are multiple, incompatible mbox formats. > > Thierry > > |
Sven - thanks! I'll try those. Oh, and as far as having to log in first, apparently you can pass in your username and password as a part of the url. So, the generic template would be:'http://www.example.com/mailman/private/LIST.mbox/LIST.mbox?username=U&password=P' On Mon, Jul 6, 2015 at 10:15 AM, Sven Van Caekenberghe <[hidden email]> wrote: With ZnHeaders and ZnMimePart you should get a long way in parsing mail boxes. I believe some people have already experimented with this, but I am not sure and I forgot. |
In reply to this post by abergel
Alex
you should ask alberto because may be he has some libraries that we could port from VW. Stef Le 6/7/15 11:50, Alexandre Bergel a écrit : > Hi! > > Is there a way to get the mbox for this mailing list. This file is supposed to contains all the mails that have been sent. > I would like to try to do some mail mining… > > Cheers, > Alexandre > |
In reply to this post by Sven Van Caekenberghe-2
I ported some code that does a good job of extracting the interesting parts from an email reply:
http://smalltalkhub.com/#!/~pdebruic/EmailReplyParser It has examples and can parse raw mails and text only or multipart emails. Its based on what github uses https://github.com/github/email_reply_parser I see no reason why it couldn't also be adapted for use with an initial email, as well as the replies.
|
http://smalltalkhub.com/#!/~pdebruic/EmailReplyParser If you commit your ConfigurationOfEmailReplyParser to Pharo/MetaRepoForPharo50 it will be available from Catalog Browser.
|
Free forum by Nabble | Edit this page |