I've got a string which is the header section of an email. I have a regex
which will split a header field name from its data (ie, "From: [hidden email]" becomes "From" and "[hidden email]") but some header lines are long and have been continued by inserting a newline and one or more spaces. Before splitting the fields I need to undo these continuations by deleting these combinations of a newline followed by some whitespace. This would certainly be trivial in Perl or any of the normal Linux regex engines but I've spent hours on this today, equipped with the PBE2 chapter, and got nowhere. How do I do this in Pharo? Thanks, Thomas |
On 27 May 2013 18:42, Thomas Worthington <[hidden email]> wrote:
> I've got a string which is the header section of an email. I have a regex > which will split a header field name from its data (ie, "From: > [hidden email]" becomes "From" and "[hidden email]") but some header lines are > long and have been continued by inserting a newline and one or more > spaces. Before splitting the fields I need to undo these continuations by > deleting these combinations of a newline followed by some whitespace. > > This would certainly be trivial in Perl or any of the normal Linux regex > engines but I've spent hours on this today, equipped with the PBE2 > chapter, and got nowhere. > > How do I do this in Pharo? trimBlock := [:string | | lines | lines := string lines. lines collect: #trimmed ]. trimBlock value: 'Header1: fooo Header2: barrr Header3: zork' => #('Header1: fooo' 'Header2: barrr' 'Header3: zork') > > Thanks, > > Thomas > -- Best regards, Igor Stasenko. |
Igor,
I don't think that is what he wants. Thomas, You can use the build in ZnHeaders class from the Zinc HTTP Components library/framework: ZnHeaders readFrom: (String crlf join: 'Foo:1 Bar: foo- bar Final:true Foo: another-foo' lines) readStream. => a ZnHeaders('Bar'->'foo- bar' 'Final'->'true' 'Foo'->#('1' 'another-foo') ) Note that ZnHeaders>>#readFrom: expects CRLF delimited lines (line the whole of the internet), while Smalltalk uses CR's, hence the little hack. Your input will probably use CRLF. Note how ZnHeaders handles identical headers, while it joins the foo- and bar. HTH, Sven On 27 May 2013, at 18:52, Igor Stasenko <[hidden email]> wrote: > On 27 May 2013 18:42, Thomas Worthington <[hidden email]> wrote: >> I've got a string which is the header section of an email. I have a regex >> which will split a header field name from its data (ie, "From: >> [hidden email]" becomes "From" and "[hidden email]") but some header lines are >> long and have been continued by inserting a newline and one or more >> spaces. Before splitting the fields I need to undo these continuations by >> deleting these combinations of a newline followed by some whitespace. >> >> This would certainly be trivial in Perl or any of the normal Linux regex >> engines but I've spent hours on this today, equipped with the PBE2 >> chapter, and got nowhere. >> >> How do I do this in Pharo? > > trimBlock := [:string | > | lines | > lines := string lines. > lines collect: #trimmed ]. > > trimBlock value: 'Header1: fooo > Header2: barrr > Header3: zork' > > => > #('Header1: fooo' 'Header2: barrr' 'Header3: zork') > >> >> Thanks, >> >> Thomas >> > > > > -- > Best regards, > Igor Stasenko. > |
In reply to this post by Thomas Worthington-2
* Use the regex in Pharo too. Should be more than adequate for what you seek. * Create your string parsing code, in the lines of what Igor suggests, extending it for multiple line header fields
* Use other libraries, like Zinc as Sven suggests, there are others similar in other packages, bit contrived and not really what you seek I presume. On Mon, May 27, 2013 at 10:12 PM, Thomas Worthington <[hidden email]> wrote: I've got a string which is the header section of an email. I have a regex |
Free forum by Nabble | Edit this page |