Getting the mbox file for this mailing list?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting the mbox file for this mailing list?

abergel
Hi!

Is there a way to get the mbox for this mailing list. This file is supposed to contains all the mails that have been sent.
I would like to try to do some mail mining…

Cheers,
Alexandre

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.




Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Marcus Denker-4

On 06 Jul 2015, at 11:50, Alexandre Bergel <[hidden email]> wrote:

Hi!

Is there a way to get the mbox for this mailing list. This file is supposed to contains all the mails that have been sent.
I would like to try to do some mail mining…


The archive is here:



Marcus

Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

abergel
Thanks!

Alexandre


> On Jul 6, 2015, at 11:53 AM, Marcus Denker <[hidden email]> wrote:
>
>
>> On 06 Jul 2015, at 11:50, Alexandre Bergel <[hidden email]> wrote:
>>
>> Hi!
>>
>> Is there a way to get the mbox for this mailing list. This file is supposed to contains all the mails that have been sent.
>> I would like to try to do some mail mining…
>>
>
> The archive is here:
>
> http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/
>
>
> Marcus
>

--
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.




Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Sean P. DeNigris
Administrator
In reply to this post by abergel
archiveUrl := 'http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org' asUrl.
archive := Soup fromUrl: archiveUrl asString.
monthlyZipLinks := archive findAllTags: [ :t | t name = 'a' and: [ (t attributeAt: 'href') endsWith: 'gz' ] ].
monthlyZipUrls := monthlyZipLinks collect: [ :t | archiveUrl / (t attributeAt: 'href') ].
(monthlyZipUrls first: 2) do: [ :e |
        ZnClient new
                url: e;
                downloadTo: e file ].

The archives are straight text files, in which the individual messages are separated by a seemingly random number of LFs. Since LFs also separate lines within the messages, IDK how you distinguish, except to use PP to look for message headers, which seem to follow a regular format e.g. `From alexandre at bergel.eu  Wed May 21 15:13:48 2008`
Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Sean P. DeNigris
Administrator
Sean P. DeNigris wrote
The archives are straight text files
Oh, I forgot one thing...
Unix... why text and files?! whyyyyy?!
Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Peter Uhnak
In reply to this post by Sean P. DeNigris
The archives are straight text files, in which the individual messages are
separated by a seemingly random number of LFs.

Actually they are valid mbox files. (At least my mutt opened it just fine.)
The separator is "From " line, not newlines.
Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Peter Uhnak
In reply to this post by Sean P. DeNigris
Oh, I forgot one thing...
Unix... why text and files?! whyyyyy?!
 
Not fan of unix I see. :)
Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Sean P. DeNigris
Administrator
In reply to this post by Peter Uhnak
Peter Uhnák wrote
Actually they are valid mbox files. (At least my mutt opened it just fine.)
Ah! Thanks :) The .txt extension threw me off. Yes, Mac Mail imports them fine as well
Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Sean P. DeNigris
Administrator
In reply to this post by Peter Uhnak
Peter Uhnák wrote
Not fan of unix I see. :)
I've been spoiled by living objects. For me, dealing with text and files is like writing in assembler :)
Cheers,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Thierry Goubier
In reply to this post by Peter Uhnak


2015-07-06 14:29 GMT+02:00 Peter Uhnák <[hidden email]>:
The archives are straight text files, in which the individual messages are
separated by a seemingly random number of LFs.

Actually they are valid mbox files. (At least my mutt opened it just fine.)
The separator is "From " line, not newlines.

From followed by a space. Each message ends with an blank line 


It seems there are multiple, incompatible mbox formats.

Thierry

Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Dmitri Zagidulin
I've been doing some mailing list analysis recently (in Ruby), and would be very interested in porting it over to Smalltalk. (I was actually getting really frustrated at the lack of proper debugging setup in Ruby, even though it had some great mail-related libraries). I was looking at thread lengths, numbers of unanswered threads, etc.

Alexandre -- I haven't been able to find a good Mail parsing library for Smalltalk (preferably one that reads the Mbox format natively), I'd be curious to know what you end up using.

As for the download URL -- the link Marcus gave is, unfortunately, in Piper-mail's own format (a simplified version of mbox, really).
To get the actual .mbox file, you'd need to use this link:

http://lists.pharo.org/mailman/private/pharo-dev_lists.pharo.org.mbox/pharo-dev_lists.pharo.org.mbox

(Note that it requires you to authenticate with your mailing list email and password (that you created when you first signed up for the mailing list)). But once authenticated, you can download it with Zinc (or wget) or whatever, and start processing it.

Let us know how it goes!


On Mon, Jul 6, 2015 at 8:41 AM, Thierry Goubier <[hidden email]> wrote:


2015-07-06 14:29 GMT+02:00 Peter Uhnák <[hidden email]>:
The archives are straight text files, in which the individual messages are
separated by a seemingly random number of LFs.

Actually they are valid mbox files. (At least my mutt opened it just fine.)
The separator is "From " line, not newlines.

From followed by a space. Each message ends with an blank line 


It seems there are multiple, incompatible mbox formats.

Thierry


Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Sven Van Caekenberghe-2
With ZnHeaders and ZnMimePart you should get a long way in parsing mail boxes. I believe some people have already experimented with this, but I am not sure and I forgot.

> On 06 Jul 2015, at 16:11, Dmitri Zagidulin <[hidden email]> wrote:
>
> I've been doing some mailing list analysis recently (in Ruby), and would be very interested in porting it over to Smalltalk. (I was actually getting really frustrated at the lack of proper debugging setup in Ruby, even though it had some great mail-related libraries). I was looking at thread lengths, numbers of unanswered threads, etc.
>
> Alexandre -- I haven't been able to find a good Mail parsing library for Smalltalk (preferably one that reads the Mbox format natively), I'd be curious to know what you end up using.
>
> As for the download URL -- the link Marcus gave is, unfortunately, in Piper-mail's own format (a simplified version of mbox, really).
> To get the actual .mbox file, you'd need to use this link:
>
> http://lists.pharo.org/mailman/private/pharo-dev_lists.pharo.org.mbox/pharo-dev_lists.pharo.org.mbox
>
> (Note that it requires you to authenticate with your mailing list email and password (that you created when you first signed up for the mailing list)). But once authenticated, you can download it with Zinc (or wget) or whatever, and start processing it.
>
> Let us know how it goes!
>
>
> On Mon, Jul 6, 2015 at 8:41 AM, Thierry Goubier <[hidden email]> wrote:
>
>
> 2015-07-06 14:29 GMT+02:00 Peter Uhnák <[hidden email]>:
> The archives are straight text files, in which the individual messages are
> separated by a seemingly random number of LFs.
>
> Actually they are valid mbox files. (At least my mutt opened it just fine.)
> The separator is "From " line, not newlines.
>
> From followed by a space. Each message ends with an blank line
>
> https://en.wikipedia.org/wiki/Mbox, https://tools.ietf.org/html/rfc4155
>
> It seems there are multiple, incompatible mbox formats.
>
> Thierry
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Dmitri Zagidulin
Sven - thanks! I'll try those.

Oh, and as far as having to log in first, apparently you can pass in your username and password as a part of the url. So, the generic template would be:
'http://www.example.com/mailman/private/LIST.mbox/LIST.mbox?username=U&password=P'



On Mon, Jul 6, 2015 at 10:15 AM, Sven Van Caekenberghe <[hidden email]> wrote:
With ZnHeaders and ZnMimePart you should get a long way in parsing mail boxes. I believe some people have already experimented with this, but I am not sure and I forgot.

> On 06 Jul 2015, at 16:11, Dmitri Zagidulin <[hidden email]> wrote:
>
> I've been doing some mailing list analysis recently (in Ruby), and would be very interested in porting it over to Smalltalk. (I was actually getting really frustrated at the lack of proper debugging setup in Ruby, even though it had some great mail-related libraries). I was looking at thread lengths, numbers of unanswered threads, etc.
>
> Alexandre -- I haven't been able to find a good Mail parsing library for Smalltalk (preferably one that reads the Mbox format natively), I'd be curious to know what you end up using.
>
> As for the download URL -- the link Marcus gave is, unfortunately, in Piper-mail's own format (a simplified version of mbox, really).
> To get the actual .mbox file, you'd need to use this link:
>
> http://lists.pharo.org/mailman/private/pharo-dev_lists.pharo.org.mbox/pharo-dev_lists.pharo.org.mbox
>
> (Note that it requires you to authenticate with your mailing list email and password (that you created when you first signed up for the mailing list)). But once authenticated, you can download it with Zinc (or wget) or whatever, and start processing it.
>
> Let us know how it goes!
>
>
> On Mon, Jul 6, 2015 at 8:41 AM, Thierry Goubier <[hidden email]> wrote:
>
>
> 2015-07-06 14:29 GMT+02:00 Peter Uhnák <[hidden email]>:
> The archives are straight text files, in which the individual messages are
> separated by a seemingly random number of LFs.
>
> Actually they are valid mbox files. (At least my mutt opened it just fine.)
> The separator is "From " line, not newlines.
>
> From followed by a space. Each message ends with an blank line
>
> https://en.wikipedia.org/wiki/Mbox, https://tools.ietf.org/html/rfc4155
>
> It seems there are multiple, incompatible mbox formats.
>
> Thierry
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

stepharo
In reply to this post by abergel
Alex

you should ask alberto because may be he has some libraries that we
could port from VW.

Stef

Le 6/7/15 11:50, Alexandre Bergel a écrit :
> Hi!
>
> Is there a way to get the mbox for this mailing list. This file is supposed to contains all the mails that have been sent.
> I would like to try to do some mail mining…
>
> Cheers,
> Alexandre
>


Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Paul DeBruicker
In reply to this post by Sven Van Caekenberghe-2
I ported some code that does a good job of extracting the interesting parts from an email reply:


http://smalltalkhub.com/#!/~pdebruic/EmailReplyParser

It has examples and can parse raw mails and text only or multipart emails.

Its based on what github uses

 https://github.com/github/email_reply_parser


I see no reason why it couldn't also be adapted for use with an initial email, as well as the replies.  




Sven Van Caekenberghe-2 wrote
With ZnHeaders and ZnMimePart you should get a long way in parsing mail boxes. I believe some people have already experimented with this, but I am not sure and I forgot.

> On 06 Jul 2015, at 16:11, Dmitri Zagidulin <[hidden email]> wrote:
>
> I've been doing some mailing list analysis recently (in Ruby), and would be very interested in porting it over to Smalltalk. (I was actually getting really frustrated at the lack of proper debugging setup in Ruby, even though it had some great mail-related libraries). I was looking at thread lengths, numbers of unanswered threads, etc.
>
> Alexandre -- I haven't been able to find a good Mail parsing library for Smalltalk (preferably one that reads the Mbox format natively), I'd be curious to know what you end up using.
>
> As for the download URL -- the link Marcus gave is, unfortunately, in Piper-mail's own format (a simplified version of mbox, really).
> To get the actual .mbox file, you'd need to use this link:
>
> http://lists.pharo.org/mailman/private/pharo-dev_lists.pharo.org.mbox/pharo-dev_lists.pharo.org.mbox
>
> (Note that it requires you to authenticate with your mailing list email and password (that you created when you first signed up for the mailing list)). But once authenticated, you can download it with Zinc (or wget) or whatever, and start processing it.
>
> Let us know how it goes!
>
>
> On Mon, Jul 6, 2015 at 8:41 AM, Thierry Goubier <[hidden email]> wrote:
>
>
> 2015-07-06 14:29 GMT+02:00 Peter Uhnák <[hidden email]>:
> The archives are straight text files, in which the individual messages are
> separated by a seemingly random number of LFs.
>
> Actually they are valid mbox files. (At least my mutt opened it just fine.)
> The separator is "From " line, not newlines.
>
> From followed by a space. Each message ends with an blank line
>
> https://en.wikipedia.org/wiki/Mbox, https://tools.ietf.org/html/rfc4155
>
> It seems there are multiple, incompatible mbox formats.
>
> Thierry
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Getting the mbox file for this mailing list?

Peter Uhnak
http://smalltalkhub.com/#!/~pdebruic/EmailReplyParser

If you commit your ConfigurationOfEmailReplyParser to Pharo/MetaRepoForPharo50 it will be available from Catalog Browser.
 


It has examples and can parse raw mails and text only or multipart emails.

Its based on what github uses

 https://github.com/github/email_reply_parser


I see no reason why it couldn't also be adapted for use with an initial
email, as well as the replies.





Sven Van Caekenberghe-2 wrote
> With ZnHeaders and ZnMimePart you should get a long way in parsing mail
> boxes. I believe some people have already experimented with this, but I am
> not sure and I forgot.
>
>> On 06 Jul 2015, at 16:11, Dmitri Zagidulin &lt;

> dmitri@

> &gt; wrote:
>>
>> I've been doing some mailing list analysis recently (in Ruby), and would
>> be very interested in porting it over to Smalltalk. (I was actually
>> getting really frustrated at the lack of proper debugging setup in Ruby,
>> even though it had some great mail-related libraries). I was looking at
>> thread lengths, numbers of unanswered threads, etc.
>>
>> Alexandre -- I haven't been able to find a good Mail parsing library for
>> Smalltalk (preferably one that reads the Mbox format natively), I'd be
>> curious to know what you end up using.
>>
>> As for the download URL -- the link Marcus gave is, unfortunately, in
>> Piper-mail's own format (a simplified version of mbox, really).
>> To get the actual .mbox file, you'd need to use this link:
>>
>> http://lists.pharo.org/mailman/private/pharo-dev_lists.pharo.org.mbox/pharo-dev_lists.pharo.org.mbox
>>
>> (Note that it requires you to authenticate with your mailing list email
>> and password (that you created when you first signed up for the mailing
>> list)). But once authenticated, you can download it with Zinc (or wget)
>> or whatever, and start processing it.
>>
>> Let us know how it goes!
>>
>>
>> On Mon, Jul 6, 2015 at 8:41 AM, Thierry Goubier &lt;

> thierry.goubier@

> &gt; wrote:
>>
>>
>> 2015-07-06 14:29 GMT+02:00 Peter Uhnák &lt;

> i.uhnak@

> &gt;:
>> The archives are straight text files, in which the individual messages
>> are
>> separated by a seemingly random number of LFs.
>>
>> Actually they are valid mbox files. (At least my mutt opened it just
>> fine.)
>> The separator is "From " line, not newlines.
>>
>> From followed by a space. Each message ends with an blank line
>>
>> https://en.wikipedia.org/wiki/Mbox, https://tools.ietf.org/html/rfc4155
>>
>> It seems there are multiple, incompatible mbox formats.
>>
>> Thierry
>>
>>





--
View this message in context: http://forum.world.st/Getting-the-mbox-file-for-this-mailing-list-tp4835958p4836140.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.