I am developing a replacement for SmaCC (far from complete).
Once it is completed I am interested in being able to define languages and have code in these languages placed directly in a Smalltalk (probably Squeak) image. I haven't figured out how this will work. Somewhere there will need to be a specification that a given class or method contains not Smalltalk code but code from some language X. The compiler for language X would generate Smalltalk byte codes that would be stored in the image much like bytes codes of Smalltalk methods. Some example languages: 1) Regular expressions 2) bash shell scripts (Linux) 3) sed scripts (Linux) 4) C There are tons of issues here to be resolved much later. Meanwhile, I thought of two languages worth adding to the list: 4) SLang 5) Assembler for the virtual machine. The reason for 4) is to allow users to write code that they mean to be efficient. My understanding is that SLang is C like and gives similar performance. Of course translating SLang methods to byte codes will slow it down but it should still be useful when performance matters. The reason for 5) is the same though I admit it would be odd for someone who is so committed to performance to write his code in assembler to then choose to write his code for a Smalltalk virtual machine. Questions: a) Is it a good idea to be able to embed other languages/language compilers into a Smalltalk image? I am sure many will object to this idea but I hope there are some supporters as well, as least in principle. At this point I am just interested in hearing what the pros and cons are. b) I am just writing an easier to use, more powerful, and faster SmaCC. To generate byte codes from a language specification more compiler utilities are needed. Has anyone done any work in this area, either in Squeak or other versions of Smalltalk? c) If efficiency is desired when converting a regular expression to a finite state machine, then one option is to represent the finite state machine in machine code or a language that supports computed gotos and collections of goto locations. A transition in the finite state machine might be represented by an instruction of the form: goto followStatesOfThisState at: inputValue ifAbsent: [self reportInvalidInputAndExit]. If I write a compiler to generate Squeak virtual machine bytes codes corresponding to an input regular expression will I be able to generate instructions equivalent to the goto instruction above? Of course there are many problems. The biggest concern I have is that we can't allow the user to generate virtual machine codes that blow the system out of the water. Admittedly this problem could kill the whole idea. Perhaps there are security issues as well. Once the above has been completed I hope to start another project. I want to build a replacement for the Linux shell bash which I call squash. Squash would be a running Squeak image which provides the user with an interface similar to when running the bash shell in Linux. It would also be possible to write squash shell scripts that provided functionality similar to bash shell scripts. Unlike bash shell scripts, however, squash shell scripts would be compiled into files of byte codes called squash executable files. It would also be possible to have other languages whose compilers generate squash executable files and have the squash shell execute them. What does all of us give us (at least on Linux)? Well, I think it gives us: 1) A version of Squeak that interfaces with other languages much better than now, thus making Squeak (Smalltalk) much more useful to the software community in general. 2) The option of using Squeak where scripting languages are now used. And since in the Squeak case the code is compiled into byte codes it should run faster than would be the case for many scripting languages. 3) Since the squash shell scripting language will have the entire Squeak image available to it, it should be a powerful language to write shell scripts in. Comments welcome, especially negative (but constructive) ones. I am looking for comments on whether this is a good or bad idea, not a detailed description of what all the problems are, though the latter may be useful too. Ralph Boland |
On 3-May-07, at 1:34 PM, Ralph Boland wrote: > > 4) SLang > The reason for 4) is to allow users to write code that they mean > to be efficient. My understanding is that SLang is C like and gives > similar performance. Of course translating SLang methods to byte > codes will slow it down but it should still be useful when > performance matters. Er, not quite. Slang is Smalltalk written in a rather ugly pidjin-C style so that the CCodeGenerator and associated classes can digest it. See http://wiki.squeak.org/squeak/slang . It is already 'translated to byte codes' since it simply compiles normally and is run normally when using the InterpreterSimulator. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Strange OpCodes: XER: Exclusive ERror |
In reply to this post by Ralph Boland
> Comments welcome, especially negative (but constructive) ones.
> I am looking for comments on whether this is a good or bad idea, > not a detailed description of what all the problems are, > though the latter may be useful too. There is a class method "compilerClass'. So, you can define a class method that indicates the compiler to use, and the browser will use that compiler for your class. Anybody can generate bytecodes, there is nothing magical about them. However, save your image often, because bad bytecodes can crash your image. SmaCC is not bad, and it will not be easy to do better. I bet the reason you are trying to redo it is that you have written parser generators before, and figure it will be a good starting project for Squeak. Writing a parser and interpreter for other languages is cool. There are already ones for Lisp and Prolog. However, this doesn't entirely solve the interoperability problem, since the basic datatypes of the languages often differs. Making a shell for Squeak is also a good idea, but it wouldn't necessarily require a new parser, let alone a parser generator. -Ralph |
In reply to this post by Ralph Boland
2007/5/3, Ralph Boland <[hidden email]>:
> I am developing a replacement for SmaCC (far from complete). > Once it is completed I am interested in being able to define languages > and have code in these languages placed directly in a > Smalltalk (probably Squeak) image. > I haven't figured out how this will work. > Somewhere there will need to be a specification > that a given class or method contains not Smalltalk code > but code from some language X. override #compilerClass IIRC once compilers for Logo and JS were made using this technique. There will of course be issues with the debugger, you'd have to at least implement a decompiler. One thing to remember about jumps in Squeak bytecodes is that they have many limits (to their lengths). You might also want to consider generating IR of the NewCompiler instead of bytecodes directly. Cheers Philippe > The compiler for language X would generate Smalltalk byte codes > that would be stored in the image much like bytes codes of Smalltalk > methods. > Some example languages: > 1) Regular expressions > 2) bash shell scripts (Linux) > 3) sed scripts (Linux) > 4) C > > There are tons of issues here to be resolved much later. > > Meanwhile, I thought of two languages worth adding to the list: > > 4) SLang > 5) Assembler for the virtual machine. > > The reason for 4) is to allow users to write code that they mean > to be efficient. My understanding is that SLang is C like and gives > similar performance. Of course translating SLang methods to byte > codes will slow it down but it should still be useful when performance > matters. > The reason for 5) is the same though I admit it would be odd for > someone who is so committed to performance to write his code in > assembler to then choose to write his code for a Smalltalk virtual machine. > > Questions: > > a) Is it a good idea to be able to embed other languages/language > compilers > into a Smalltalk image? I am sure many will object to this idea > but I hope there are some supporters as well, as least in principle. > At this point I am just interested in hearing what the pros and cons are. > > b) I am just writing an easier to use, more powerful, and faster SmaCC. > To generate byte codes from a language specification > more compiler utilities are needed. > Has anyone done any work in this area, > either in Squeak or other versions of Smalltalk? > > c) If efficiency is desired when converting a regular expression > to a finite state machine, then one option is to represent > the finite state machine in machine code or a language that > supports computed gotos and collections of goto locations. > A transition in the finite state machine might be represented by > an instruction of the form: > > goto followStatesOfThisState at: inputValue ifAbsent: [self > reportInvalidInputAndExit]. > > If I write a compiler to generate Squeak virtual machine bytes codes > corresponding to an input regular expression > will I be able to generate instructions > equivalent to the goto instruction above? > > > Of course there are many problems. The biggest concern I have is that > we can't allow the user to generate virtual machine codes that blow the > system out of the water. Admittedly this problem could kill the whole > idea. Perhaps there are security issues as well. > > > Once the above has been completed I hope to start another project. > I want to build a replacement for the Linux shell bash which I call squash. > > Squash would be a running Squeak image which provides the user with an > interface similar to when running the bash shell in Linux. > It would also be possible to write squash shell scripts that provided > functionality > similar to bash shell scripts. Unlike bash shell scripts, however, squash > shell > scripts would be compiled into files of byte codes called squash executable > files. > It would also be possible to have other languages whose compilers generate > squash executable files and have the squash shell execute them. > > What does all of us give us (at least on Linux)? > Well, I think it gives us: > > 1) A version of Squeak that interfaces with other languages much better > than now, thus making Squeak (Smalltalk) much more useful to the > software community in general. > 2) The option of using Squeak where scripting languages are now used. > And since in the Squeak case the code is compiled into byte codes it > should run faster than would be the case for many scripting languages. > 3) Since the squash shell scripting language > will have the entire Squeak image available to it, > it should be a powerful language to write shell scripts in. > > Comments welcome, especially negative (but constructive) ones. > I am looking for comments on whether this is a good or bad idea, > not a detailed description of what all the problems are, > though the latter may be useful too. > > Ralph Boland > > > > |
In reply to this post by Ralph Boland
On May 3, 2007, at 16:34 , Ralph Boland wrote:
> I am developing a replacement for SmaCC (far from complete). > Once it is completed I am interested in being able to define languages > and have code in these languages placed directly in a > Smalltalk (probably Squeak) image. There is Marcus' Babel package in Croquet that implements a couple of different syntaxes. It's Smacc-based. And very recently Alex and Yoshiki ported the Meta parser from Coke to Squeak, this is a couple of changesets in the OLPC image. Might be worth checking out. - Bert - |
> And very recently Alex and Yoshiki ported the Meta parser from Coke
> to Squeak, this is a couple of changesets in the OLPC image. > > Might be worth checking out. Wow, thank you for mentioning it, Bert! The good thing about META in Squeak is its size (10 classes including the bootstrap parser), meta circular definition (it is written in itself), ability to write actions anywhere in production definitions (actually, there is no real distinction between actions and lhs syntactical symbols), integration with Squeak browser (you simply write production rules in a browser), no-real need to write tokenizer and parser separately, and linear time parsing with unlimited look ahead and back-tracking. There is very little pre-defined things. For example, you would write a tokenizer for parsing a number (an equivalent of "<number> : [0-9]+ (\. [0-9]*) ? ;" and "{'1' value asNumber}" in SmaCC) would be written in Meta like a method: -------------------- number ::= `0`:r (<char>:c ?`c isDigit` `r * 10 + c digitValue`:r)+ `r` -------------------- Things enclosed by back-ticks (`) are like things in {} in SmaCC, a colon (:) followed by a name is like naming in SmaCC ('expression'), but this is a real assignment. (So, the calculated value in the long action is re-assigned to the same variable.) Question and back-tick (?``) is a predicate that tells whether the parsing should continue or not. A name enclosed by <> is a method call (or production call). And, the last item in the production is the result value. This is a tokenizer example, but the parser production is written in the same way. We haven't gotten around to port the left-recursion case over to the Squeak version. Without the left-recursion, it can be painful sometime. We plan to port that to Meta in Squeak sometime soon. Ah, yes, the bottom line is that I think it is interesting and worth to take a look at. -- Yoshiki |
> Ah, yes, the bottom line is that I think it is interesting and worth
> to take a look at. Cool, but where is it available from? Lukas -- Lukas Renggli http://www.lukas-renggli.ch |
Lukas,
> > Ah, yes, the bottom line is that I think it is interesting and worth > > to take a look at. > > Cool, but where is it available from? Ah sorry. The OLPC images are available at: http://tinlizzie.org/olpc/ and, the latest for developers is: http://tinlizzie.org/olpc/etoys-dev-2.0-1315.zip -- Yoshiki |
In reply to this post by Lukas Renggli
Lukas Renggli wrote:
>> Ah, yes, the bottom line is that I think it is interesting and worth >> to take a look at. > > Cool, but where is it available from? > > Lukas > http://tinlizzie.org/olpc/etoys-dev-2.0-latest.zip I think you have to update the image. Karl |
In reply to this post by Philippe Marschall
what would be nice is to be able to debug squash scripts because this
is always the problems with unix scripts for me. I'm always saying to myself in Smalltalk I would have put a break put and fix it in 10th of the time. Stef On 3 mai 07, at 23:39, Philippe Marschall wrote: >> Once the above has been completed I hope to start another project. >> I want to build a replacement for the Linux shell bash which I >> call squash. >> >> Squash would be a running Squeak image which provides the user >> with an >> interface similar to when running the bash shell in Linux. >> It would also be possible to write squash shell scripts that provided >> functionality >> similar to bash shell scripts. Unlike bash shell scripts, >> however, squash >> shell >> scripts would be compiled into files of byte codes called squash >> executable >> files. >> It would also be possible to have other languages whose compilers >> generate >> squash executable files and have the squash shell execute them. |
In reply to this post by Yoshiki Ohshima
Hi Yo
did you publish it on squeaksource or place like that :) Stef On 4 mai 07, at 02:21, Yoshiki Ohshima wrote: >> And very recently Alex and Yoshiki ported the Meta parser from Coke >> to Squeak, this is a couple of changesets in the OLPC image. >> >> Might be worth checking out. > > Wow, thank you for mentioning it, Bert! > > The good thing about META in Squeak is its size (10 classes > including the bootstrap parser), meta circular definition (it is > written in itself), ability to write actions anywhere in production > definitions (actually, there is no real distinction between actions > and lhs syntactical symbols), integration with Squeak browser (you > simply write production rules in a browser), no-real need to write > tokenizer and parser separately, and linear time parsing with > unlimited look ahead and back-tracking. > > There is very little pre-defined things. For example, you would > write a tokenizer for parsing a number (an equivalent of > > "<number> : [0-9]+ (\. [0-9]*) ? ;" > > and "{'1' value asNumber}" in SmaCC) would be written in Meta like > a method: > > -------------------- > number ::= > `0`:r (<char>:c ?`c isDigit` `r * 10 + c digitValue`:r)+ `r` > -------------------- > > Things enclosed by back-ticks (`) are like things in {} in SmaCC, a > colon (:) followed by a name is like naming in SmaCC ('expression'), > but this is a real assignment. (So, the calculated value in the long > action is re-assigned to the same variable.) Question and back-tick > (?``) is a predicate that tells whether the parsing should continue or > not. A name enclosed by <> is a method call (or production call). > And, the last item in the production is the result value. This is a > tokenizer example, but the parser production is written in the same > way. > > We haven't gotten around to port the left-recursion case over to the > Squeak version. Without the left-recursion, it can be painful > sometime. We plan to port that to Meta in Squeak sometime soon. > > Ah, yes, the bottom line is that I think it is interesting and worth > to take a look at. > > -- Yoshiki > > |
Hi Stef,
> did you publish it on squeaksource or place like that :) Ah, making changesets that bootstrap META properly requires some work (I did once but merging subsequent patches and do it again sounds a bit tedious). If I get around to do it, I'd do, but I'm equally happy if somebody pull that off. BTW, until yesterday I wasn't aware of that "not-not" is "yes", or I can synthesize the and-predicate (in parsing expression grammar sense) by using not-predicate twice. So, I can say that META can express the popular example of context-sensitive grammar: a^{n} b^{n} c^{n} : n > 0 with something like: -------------------- MTokenizer subclass: #MABC instanceVariableNames: '' classVariableNames: '' poolDictionaries: '' category: 'Meta-Examples'! !MABC methodsFor: 'productions' stamp: 'yo 5/4/2007 11:15'! a ::= $a <a> $b | `true`! ! !MABC methodsFor: 'productions' stamp: 'yo 5/4/2007 11:23'! aa ::= <a>:a ?`a ~= true`! ! !MABC methodsFor: 'productions' stamp: 'yo 5/4/2007 11:24'! b ::= $b <b> $c | `true`! ! !MABC methodsFor: 'productions' stamp: 'yo 5/4/2007 11:26'! bb ::= <b>:b ?`b ~= true`! ! !MABC methodsFor: 'productions' stamp: 'yo 5/4/2007 11:28'! s ::= ~(~(<aa> ~$b)) $a+ <bb> ~$c! ! -------------------- and I can test that with a do-it: -------------------- MABC match: 'aaabbbccc' with: #s -------------------- -- Yoshiki |
In reply to this post by Ralph Boland
>From: "Ralph Boland" <[hidden email]>
>Reply-To: The general-purpose Squeak developers >list<[hidden email]> >To: [hidden email] >Subject: multi-language GUI / shells in Smalltalk >Date: Thu, 3 May 2007 16:34:18 -0400 > > b) I am just writing an easier to use, more powerful, and faster SmaCC. >To generate byte codes from a language specification >more compiler utilities are needed. >Has anyone done any work in this area, >either in Squeak or other versions of Smalltalk? What style of parser is it? Is it a combinator-style like parsec? _________________________________________________________________ Need a break? Find your escape route with Live Search Maps. http://maps.live.com/default.aspx?ss=Restaurants~Hotels~Amusement%20Park&cp=33.832922~-117.915659&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=1118863&encType=1&FORM=MGAC01 |
In reply to this post by stephane ducasse
Step and everybody,
> did you publish it on squeaksource or place like that :) I finally uploaded a SAR package (sort of gave up on making mcz... sorry) and created a SqueakMap entry for the META implementation in Squeak. http://map.squeak.org/account/package/e3cdcb13-a408-49c8-9a97-5e9b8befa4ac This package includes a sample implementation of Squeak Smalltalk-80 parser that can generate a simple intermediate form, and then ParseNode and then Squeak bytecode. I spent some time to make sure this generates identical bytecode with the default Squeak compiler... Now, a test like following: ------------ | m sq o | o := Morph. [o selectors do: [:sel | sq := SqueakParseNodeBuilder new encoder: (Encoder new init: o context: nil notifying: nil). m := (sq parse: (MSqueakParser2 match: (o sourceCodeAt: sel) string with: #method)) generate: #(0 0 0 0). (m decompileClass: o selector: sel) printString = (o decompile: sel) printString ifFalse: [self halt]. ]] timeToRun ------------ that compiles the source code, decompile the result, and compare it with the method compiled with the default compiler. For example, all the method of Morph seem to generate identical bytecode. My compiler doesn't do any fancy error handling... However, with the unlimited lookahead feature, it *should* be easier to write necessary syntax error detection in the grammar itself. -- Yoshiki |
Just reposting with a proper subject and fixed link ;)
- Bert - On May 22, 2007, at 23:03 , Yoshiki Ohshima wrote: > Step and everybody, > >> did you publish it on squeaksource or place like that :) > > I finally uploaded a SAR package (sort of gave up on making > mcz... sorry) and created a SqueakMap entry for the META > implementation in Squeak. > > http://map.squeak.org/account/package/e3cdcb13- > a408-49c8-9a97-5e9b8befa4ac Actually, http://map.squeak.org/package/e3cdcb13- a408-49c8-9a97-5e9b8befa4ac > This package includes a sample implementation of Squeak Smalltalk-80 > parser that can generate a simple intermediate form, and then > ParseNode and then Squeak bytecode. I spent some time to make sure > this generates identical bytecode with the default Squeak compiler... > Now, a test like following: > > ------------ > | m sq o | > o := Morph. > [o selectors do: [:sel | > sq := SqueakParseNodeBuilder new encoder: (Encoder new init: o > context: nil notifying: nil). > m := (sq parse: (MSqueakParser2 match: (o sourceCodeAt: sel) > string with: #method)) generate: #(0 0 0 0). > (m decompileClass: o selector: sel) printString = (o decompile: > sel) printString ifFalse: [self halt]. > ]] timeToRun > ------------ > > that compiles the source code, decompile the result, and compare it > with the method compiled with the default compiler. For example, all > the method of Morph seem to generate identical bytecode. > > My compiler doesn't do any fancy error handling... However, with > the unlimited lookahead feature, it *should* be easier to write > necessary syntax error detection in the grammar itself. > > -- Yoshiki > |
> Just reposting with a proper subject and fixed link ;)
Ooh, thanks! Just to be clear, this is a knock-down version of "real" Meta in Jolt. Later on, we might try to incorporate new features and different internal structures from that into this Squeak version. -- Yoshiki |
Free forum by Nabble | Edit this page |