Smalltalk › Pharo › Pharo Smalltalk Users

Catching EOF in SmaCC

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

4 messages Options

Prof. Andrew P. Black

Catching EOF in SmaCC

Is there a way, in SmaCC, or either

- writing a grammar production that involves EOF (end of file)
- or, writing a scanner action that is executed when EOF is read.

Andrew

Thierry Goubier

Re: Catching EOF in SmaCC

Hi Andrew,

there is an 'E O F' token generated by SmaCC; I haven't tried to use it
in a parser yet.

The second is used in the Python2 parser. See:

https://github.com/ThierryGoubier/SmaCC/blob/master/SmaCC-Python.package/PythonScanner2.class/instance/scannerError.st

Regards,

Thierry

Le 17/11/2017 à 04:11, Prof. Andrew P. Black a écrit :
> Is there a way, in SmaCC, or either
>
> - writing a grammar production that involves EOF (end of file)
> - or, writing a scanner action that is executed when EOF is read.
>
> Andrew
>
>
>

Prof. Andrew P. Black

Re: Catching EOF in SmaCC

> On 17 Nov 2017, at 14:10 , Thierry Goubier <[hidden email]> wrote:
>
>
> there is an 'E O F' token generated by SmaCC; I haven't tried to use it in a parser yet.

I tried patching the tokenActions table to trap on this, but the token id for E O F is outside of the range of the table. The Python example that you pointed me to is a little different. It overrides scannerError, and explicitly adds a newline token if there is an error at the end of the file. It doesn’t actually use the E O F token, but it is probably a pattern that I can steal.

In the meantime, I made the final StatementSeparator (<newline> or ";") optional in all the productions. The grammar is a bit ugly, but the parser is cleaner.

I also gave up trying to eliminate intermediate parseTree nodes. Instead, I eliminated intermediate productions form the grammar. This makes the grammar more ugly (it has several repetitions where I inlined the intermediate productions), but the
tree construction is a lot more straightforward.

Andrew

Thierry Goubier

Re: Catching EOF in SmaCC

Hi Andrew,

Le 17/11/2017 à 12:26, Prof. Andrew P. Black a écrit :
>
>> On 17 Nov 2017, at 14:10 , Thierry Goubier <[hidden email]> wrote:
>>
>>
>> there is an 'E O F' token generated by SmaCC; I haven't tried to use it in a parser yet.
>
> I tried patching the tokenActions table to trap on this, but the token id for E O F is outside of the range of the table. The Python example that you pointed me to is a little different. It overrides scannerError, and explicitly adds a newline token if there is an error at the end of the file. It doesn’t actually use the E O F token, but it is probably a pattern that I can steal.

In all honesty, I wasn't thinking about that, but instead to be able to
write '<eof>' in the grammar itself to terminate statements.

The Python approach is necessary because you may have to emit additional
dedent tokens at the end of a file (this is a typical issue of those
meaningfull identation whitespace languages: an idea used in the very
beginning of programming languages, then considered harmfull, then
coming back up again...).

>
> In the meantime, I made the final StatementSeparator (<newline> or ";") optional in all the productions. The grammar is a bit ugly, but the parser is cleaner.

Which is the cleanest way to do it (at least, like that, you have a
documented way around that instead of carrying around a grammar + hacks
in the scanner)(*)

> I also gave up trying to eliminate intermediate parseTree nodes. Instead, I eliminated intermediate productions form the grammar. This makes the grammar more ugly (it has several repetitions where I inlined the intermediate productions), but the
> tree construction is a lot more straightforward.

Sorry for having been unable to answer your questions on that :( I'm
happy to learn you've found a way around it.

Thierry

(*) Which is still way better than a hand-written, recursive descent
parser where any line can hide a hack...

> Andrew
>
>
>