I am trying to write a simple bibliographic parsing program.
Basically, there is a text file which has some 'tagged' lines e.g. 'AU Smith, John' etc. I thought I would try to write the code so that it read in each line, took the first 2 chars and tried to send that string to itself as a message, with the rest of the line as the argument, i.e. in pseudocode msg := (aLine first:2). arg := aLine - the rest of the line!) self msg: arg. The only way I have found to make this work is to use the Compiler evaluate: method. Is that right, or is there a simpler way of getting Smalltalk to execute the code. Also, I can only make this work with numbers and symbols. I can't figure out quite how to escape the text to make it work, i.e. Compiler evaluate: 'self doSomething: "argument"'. Could someone please show me my obvious error! TVM AB |
Andy,
> I thought I would try to write the code so that it read in each line, > took the first 2 chars and tried to send that string to itself as a > message, with the rest of the line as the argument, i.e. in pseudocode > > msg := (aLine first:2). > arg := aLine - the rest of the line!) > > self msg: arg. You could probably do it with: tag := aLine first: 2. arg := aLine allButFirst: 2. "or whatever" msg := (tag , ':') asSymbol. self perform: msg with: arg. but there are usually more straightforward and reliable (and boring ;-) ways of parsing than invoking an arbitrary #perform: (what if the first two characters don't form a recognised method name ? Worse, what if they do, but not one you want to have called ??). If I were doing this as a serious project, I would probably split off the tag and look it up in a fixed Dictionary of Blocks, and then evaluate the Block passing the string as its argument. There are other techniques, but that is simple and robust. -- chris |
In reply to this post by Andy Burnett
Hi andy,
> I am trying to write a simple bibliographic parsing program. > Basically, there is a text file which has some 'tagged' lines e.g. 'AU > Smith, John' etc. Another way is to look at SmaCC (http://www.refactory.com/Software/SmaCC/index.html), and write a little grammar. Its not as hard as you might think and there are some good examples in there. It depends on whether you can express your stuff as a grammar - either way its fun trying out there little calculator example. Tim |
In reply to this post by Andy Burnett
> I thought I would try to write the code so that it read in each line,
> took the first 2 chars and tried to send that string to itself as a > message, with the rest of the line as the argument, i.e. in pseudocode > > msg := (aLine first:2). > arg := aLine - the rest of the line!) > > self msg: arg. > May be that: selector := (aLine first:2) asSymbol. arg := aLine - the rest of the line!) self perform: selector with: arg. |
Brilliant, thanks to you all. I didn't know about the perform: method.
I only stumbled across evaluate: because that exists in other languages and I thought I would give it a try in the methods browser. I will certainly look at smaCC, I am always happy to learn new things. My plan for tags which didn't exist was to redefine the doesNotUnderstand message in my class so that it just returned. I was actually quite pleased with that idea as it seemed a neat way of avoiding the equivalent of a switch statement. However, it may actually be opening me up to all sorts of terrible outcomes which I don't yet understand! I love the idea of using the dictionary with the code blocks. I hadn't even thought of that. Presumably, I would need to send something like: aDictionary at: tagName value: restOfLine. Thanks AB |
[hidden email] wrote:
>... > I love the idea of using the dictionary with the code blocks. I hadn't > even thought of that. Presumably, I would need to send something like: > > aDictionary at: tagName value: restOfLine. > Actually, you need some parenthesis (or a variable) for that to work, like: (aDictionary at: tagName) value: restOfLine |
In reply to this post by Andy Burnett
[hidden email] wrote:
> I love the idea of using the dictionary with the code blocks. I hadn't > even thought of that. Presumably, I would need to send something like: > > aDictionary at: tagName value: restOfLine. You might also want to look at the #at:ifPresent: and #at:ifAbsent: methods of LookupTable. BTW, the last time I used the dictionary-of-blocks approach, I did it slightly differently and passed a ReadStream to the block instead of the remainder of the input string. For my app, it made the code rather nice (but then, I like streams ;-) Something like: input := ...whatever... stream := input readStream. tag := stream next: 2. blocks at: tag ifPresent: [:it | it value: stream]. I'm not saying that's generally better, but it worked well for me in that one application, and it's an option which might not have occurred to you. -- chris |
Thanks, that is an interesting thought. At the moment I am using
(aFileStream atEnd) whileFalse: [self process: aFileStream nextLine] Taking your approach, would I pass the result of nextLine to the block in the dictionary. I feel that there should be some clever recursive approach here. I just can't see it yet. Cheers AB |
andy,
> At the moment I am using > > (aFileStream atEnd) whileFalse: [self process: aFileStream nextLine] > > Taking your approach, would I pass the result of nextLine to the block > in the dictionary. I feel that there should be some clever recursive > approach here. I just can't see it yet. Please note: I'm not /recommending/ anything here -- this is just chatter about possibilities. Your approach, above, has the nice property that the code for dealing with each "record" (or whatever you want to call it) doesn't need to know that they are represented as separate lines in a file -- they could have come from rows in a database table for instance. It keeps the code for identifying records in the file separate from the code for handling each record once identified. But if you are willing to dispense with that, then another option would be to push more of the responsibility for parsing onto the code invoked via the blocks. Something like [aStream atEnd] whileFalse: [| tag parser | tag := self readTagFrom: aStream. parser := parsingBlocks at: tag. parser value: aStream]. In that, the code invoked by each block is responsible for consuming as much of the stream as it needs to (including the terminating end-of-line). One application of the extra flexibility that buys you would be if some (but not necessarily all) kinds of records were allowed to span several lines. Of course, that extra flexibility isn't free -- you have to give up the "nice" property to buy it. But I probably wouldn't even have considered that scheme except that I like working with streams wherever possible -- I just find it easier to think about consuming data from streams than splitting strings up into substrings. Maybe I'm odd ;-) -- chris |
Chris,
> But I probably wouldn't even have considered that scheme except that I like > working with streams wherever possible -- I just find it easier to think about > consuming data from streams than splitting strings up into substrings. Maybe > I'm odd ;-) I'm _certain_ I'm odd :) I agree about streams. One can usually note the current position and return to it if a subsequent read attempt fails. I have a few gizmos that iteratively instantiate would-be readers, moving forward only when one of them succeeds. Better still, sometimes one can read ahead just enough to tell which reader is appropriate. Of course, scanner/parser experts are probably screaming over this thread :) Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
Bill,
> Of course, scanner/parser experts are probably screaming over this thread Bah ! Humbug to 'em. -- chris |
Free forum by Nabble | Edit this page |