Hi.
I'm in a middle of porting someone else's code from Dolphin 3.0.6. to 5.1. and I ran into one difference that surprised me. Old 'PositionableStream>>nextWord' got mutated into 'SequencedStream>>nextWord' but in progress it also got it's algorithm changed. The new version reads the next word and also skips the first separator after it, while the old version had not skipped the following separator. That just broke a lot of code here when porting. Well, to be perfectly honest, the only thing that broke are some ugly hand-crafted parsers which don't deserve to live anyway, but still - would have been nice to be able to run the application without needing to fix those parsers. Oh, oh, and did I mention - no unit tests for this code.... Argh!!! :-)) Best regards, Jurko |
Hi.
Ok, the more I think about this - the more it bugs me. Is there any general standard ruling on whether nextWord should or should not eat the separator following the word? My feeling is that it shouldn't but more than that my feeling is that this should be specified and standardized, otherwise it is to easy to make a mistake like reading a word and then trying to 'push it back' on the stream by doing something like 'aStream skip: aWord size negated'. Thanks in advance. Best regards, Jurko |
Jurko,
> Ok, the more I think about this - the more it bugs me. Is > there any general standard ruling on whether nextWord should > or should not eat the separator following the word? ANSI doesn't mention a #nextWord selector so there is no "official" definition. Dolphin tends to use VW as a benchmark in this sort of situation so it may be (I haven't got VW to try it on) that the change in behaviour was made with VW compatibility in mind. > My feeling is that it shouldn't I would tend to agree with you but I can't say I have a strong feeling on the subject. > but more than that my feeling is > that this should be specified and standardized, Agreed, but who sets the standard? The same thing applies to #nextLine - should it absorb the line delimiter from the stream or not. -- Ian Use the Reply-To address to contact me. Mail sent to the From address is ignored. |
In reply to this post by Jurko Gospodnetic
Apologies to all if this appears twice. My usual newsserver is having
(yet another) tantrum. Jurko, > Ok, the more I think about this - the more it bugs me. Is > there any general standard ruling on whether nextWord should > or should not eat the separator following the word? ANSI doesn't mention a #nextWord selector so there is no "official" definition. Dolphin tends to use VW as a benchmark in this sort of situation so it may be (I haven't got VW to try it on) that the change in behaviour was made with VW compatibility in mind. > My feeling is that it shouldn't I would tend to agree with you but I can't say I have a strong feeling on the subject. > but more than that my feeling is > that this should be specified and standardized, Agreed, but who sets the standard? The same thing applies to #nextLine - should it absorb the line delimiter from the stream or not. -- Ian Use the Reply-To address to contact me. Mail sent to the From address is ignored. |
In reply to this post by Ian Bartholomew-18
Jurko, Ian,
> Dolphin tends to use VW as a benchmark in this sort of situation so it > may be (I haven't got VW to try it on) that the change in behaviour was > made with VW compatibility in mind. As far as I can tell #nextWord is unique to Dolphin (or maybe it comes from VSE) -- VW and GNUSt don't seem to have it, and though VASt and Squeak do have the selector, it has a semantics that is more like Dolphin's #nextWORD. > > My feeling is that it shouldn't > > I would tend to agree with you but I can't say I have a strong feeling > on the subject. Likewise, I have no strong opinion. Selectors like #nextWord aren't much use for real parsing, so if I use it at all it's likely to be just in throwaway code. It does seem slightly dodgy that it skips all the whitespace *before* the word, and consumes only one character of whitespace *after* the word, though. -- chris |
In reply to this post by Ian Bartholomew-18
"Ian Bartholomew" <[hidden email]> wrote in message
news:c1obmu$1k77sh$[hidden email]... > [re: #nextWord consuming the first trailing delimiter from the stream] > ... > The same thing applies to #nextLine - should it absorb the line > delimiter from the stream or not. >... Yes. From ANSI: "Each object in the receiver's future sequence values up to and including the first occurrence of the objects that constitue an implementation defined end-of-line sequence is removed..." #nextWord is consistent with this in that it removes the first trailing separator. It is inconsistent with #nextLine in that it skips leading separators, but this is necessary because tabs, spaces, etc can form part of a "line", but not a "word" (in this sense). I don't really have a strong opinion about whether the current implementation is right or wrong (and don't intend to change it) but I think this does need to be explained in the method comment. Regards Blair |
Free forum by Nabble | Edit this page |