Smalltalk › Squeak › Squeak - Dev

multi-language GUI / shells in Smalltalk

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

16 messages Options

Ralph Boland

multi-language GUI / shells in Smalltalk

I am developing a replacement for SmaCC (far from complete).
Once it is completed I am interested in being able to define languages
and have code in these languages placed directly in a
Smalltalk (probably Squeak) image.
I haven't figured out how this will work.
Somewhere there will need to be a specification
that a given class or method contains not Smalltalk code
but code from some language X.
The compiler for language X would generate Smalltalk byte codes
that would be stored in the image much like bytes codes of Smalltalk methods.
Some example languages:
   1) Regular expressions
   2) bash shell scripts (Linux)
   3) sed scripts    (Linux)
   4) C

There are tons of issues here to be resolved much later.

Meanwhile, I thought of two languages worth adding to the list:

     4) SLang
     5) Assembler for the virtual machine.

The reason for 4) is to allow users to write code that they mean
to be efficient. My understanding is that SLang is C like and gives
similar performance. Of course translating SLang methods to byte
codes will slow it down but it should still be useful when performance matters.
The reason for 5) is the same though I admit it would be odd for
someone who is so committed to performance to write his code in
assembler to then choose to write his code for a Smalltalk virtual machine.

Questions:

   a) Is it a good idea to be able to embed other languages/language compilers
into a Smalltalk image? I am sure many will object to this idea
but I hope there are some supporters as well, as least in principle.
At this point I am just interested in hearing what the pros and cons are.

   b) I am just writing an easier to use, more powerful, and faster SmaCC.
To generate byte codes from a language specification
more compiler utilities are needed.
Has anyone done any work in this area,
either in Squeak or other versions of Smalltalk?

c) If efficiency is desired when converting a regular expression
to a finite state machine, then one option is to represent
the finite state machine in machine code or a language that
supports computed gotos and collections of goto locations.
A transition in the finite state machine might be represented by
an instruction of the form:

goto followStatesOfThisState at: inputValue ifAbsent: [self reportInvalidInputAndExit].

If I write a compiler to generate Squeak virtual machine bytes codes
corresponding to an input regular expression
will I be able to generate instructions
equivalent to the goto instruction above?

Of course there are many problems. The biggest concern I have is that
we can't allow the user to generate virtual machine codes that blow the
system out of the water. Admittedly this problem could kill the whole
idea. Perhaps there are security issues as well.

Once the above has been completed I hope to start another project.
I want to build a replacement for the Linux shell bash which I call squash.

Squash would be a running Squeak image which provides the user with an
interface similar to when running the bash shell in Linux.
It would also be possible to write squash shell scripts that provided functionality
similar to bash shell scripts. Unlike bash shell scripts, however, squash shell
scripts would be compiled into files of byte codes called squash executable files.
It would also be possible to have other languages whose compilers generate
squash executable files and have the squash shell execute them.

What does all of us give us (at least on Linux)?
Well, I think it gives us:

1) A version of Squeak that interfaces with other languages much better
than now, thus making Squeak (Smalltalk) much more useful to the
software community in general.
2) The option of using Squeak where scripting languages are now used.
And since in the Squeak case the code is compiled into byte codes it
should run faster than would be the case for many scripting languages.
   3) Since the squash shell scripting language
will have the entire Squeak image available to it,
it should be a powerful language to write shell scripts in.

Comments welcome, especially negative (but constructive) ones.
I am looking for comments on whether this is a good or bad idea,
not a detailed description of what all the problems are,
though the latter may be useful too.

Ralph Boland

timrowledge

Re: multi-language GUI / shells in Smalltalk

On 3-May-07, at 1:34 PM, Ralph Boland wrote:
>
> 4) SLang

> The reason for 4) is to allow users to write code that they mean
> to be efficient. My understanding is that SLang is C like and gives
> similar performance. Of course translating SLang methods to byte
> codes will slow it down but it should still be useful when
> performance matters.

Er, not quite. Slang is Smalltalk written in a rather ugly pidjin-C
style so that the CCodeGenerator and associated classes can digest
it. See http://wiki.squeak.org/squeak/slang . It is already
'translated to byte codes' since it simply compiles normally and is
run normally when using the InterpreterSimulator.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: XER: Exclusive ERror

Ralph Johnson

Re: multi-language GUI / shells in Smalltalk

In reply to this post by Ralph Boland

> Comments welcome, especially negative (but constructive) ones.
> I am looking for comments on whether this is a good or bad idea,
> not a detailed description of what all the problems are,
> though the latter may be useful too.

There is a class method "compilerClass'. So, you can define a class
method that indicates the compiler to use, and the browser will use
that compiler for your class.

Anybody can generate bytecodes, there is nothing magical about them.
However, save your image often, because bad bytecodes can crash your
image.

SmaCC is not bad, and it will not be easy to do better. I bet the
reason you are trying to redo it is that you have written parser
generators before, and figure it will be a good starting project for
Squeak.

Writing a parser and interpreter for other languages is cool. There
are already ones for Lisp and Prolog. However, this doesn't entirely
solve the interoperability problem, since the basic datatypes of the
languages often differs. Making a shell for Squeak is also a good
idea, but it wouldn't necessarily require a new parser, let alone a
parser generator.

-Ralph

Philippe Marschall

Re: multi-language GUI / shells in Smalltalk

In reply to this post by Ralph Boland

2007/5/3, Ralph Boland <[hidden email]>:
> I am developing a replacement for SmaCC (far from complete).
> Once it is completed I am interested in being able to define languages
> and have code in these languages placed directly in a
> Smalltalk (probably Squeak) image.
> I haven't figured out how this will work.
> Somewhere there will need to be a specification
> that a given class or method contains not Smalltalk code
> but code from some language X.

override #compilerClass
IIRC once compilers for Logo and JS were made using this technique.
There will of course be issues with the debugger, you'd have to at
least implement a decompiler.
One thing to remember about jumps in Squeak bytecodes is that they
have many limits (to their lengths).
You might also want to consider generating IR of the NewCompiler
instead of bytecodes directly.

Cheers
Philippe

> The compiler for language X would generate Smalltalk byte codes
> that would be stored in the image much like bytes codes of Smalltalk
> methods.
> Some example languages:
> 1) Regular expressions
> 2) bash shell scripts (Linux)
> 3) sed scripts (Linux)
> 4) C
>
> There are tons of issues here to be resolved much later.
>
> Meanwhile, I thought of two languages worth adding to the list:
>
> 4) SLang
> 5) Assembler for the virtual machine.
>
> The reason for 4) is to allow users to write code that they mean
> to be efficient. My understanding is that SLang is C like and gives
> similar performance. Of course translating SLang methods to byte
> codes will slow it down but it should still be useful when performance
> matters.
> The reason for 5) is the same though I admit it would be odd for
> someone who is so committed to performance to write his code in
> assembler to then choose to write his code for a Smalltalk virtual machine.
>
> Questions:
>
> a) Is it a good idea to be able to embed other languages/language
> compilers
> into a Smalltalk image? I am sure many will object to this idea
> but I hope there are some supporters as well, as least in principle.
> At this point I am just interested in hearing what the pros and cons are.
>
> b) I am just writing an easier to use, more powerful, and faster SmaCC.
> To generate byte codes from a language specification
> more compiler utilities are needed.
> Has anyone done any work in this area,
> either in Squeak or other versions of Smalltalk?
>
> c) If efficiency is desired when converting a regular expression
> to a finite state machine, then one option is to represent
> the finite state machine in machine code or a language that
> supports computed gotos and collections of goto locations.
> A transition in the finite state machine might be represented by
> an instruction of the form:
>
> goto followStatesOfThisState at: inputValue ifAbsent: [self
> reportInvalidInputAndExit].
>
> If I write a compiler to generate Squeak virtual machine bytes codes
> corresponding to an input regular expression
> will I be able to generate instructions
> equivalent to the goto instruction above?
>
>
> Of course there are many problems. The biggest concern I have is that
> we can't allow the user to generate virtual machine codes that blow the
> system out of the water. Admittedly this problem could kill the whole
> idea. Perhaps there are security issues as well.
>
>
> Once the above has been completed I hope to start another project.
> I want to build a replacement for the Linux shell bash which I call squash.
>
> Squash would be a running Squeak image which provides the user with an
> interface similar to when running the bash shell in Linux.
> It would also be possible to write squash shell scripts that provided
> functionality
> similar to bash shell scripts. Unlike bash shell scripts, however, squash
> shell
> scripts would be compiled into files of byte codes called squash executable
> files.
> It would also be possible to have other languages whose compilers generate
> squash executable files and have the squash shell execute them.
>
> What does all of us give us (at least on Linux)?
> Well, I think it gives us:
>
> 1) A version of Squeak that interfaces with other languages much better
> than now, thus making Squeak (Smalltalk) much more useful to the
> software community in general.
> 2) The option of using Squeak where scripting languages are now used.
> And since in the Squeak case the code is compiled into byte codes it
> should run faster than would be the case for many scripting languages.
> 3) Since the squash shell scripting language
> will have the entire Squeak image available to it,
> it should be a powerful language to write shell scripts in.
>
> Comments welcome, especially negative (but constructive) ones.
> I am looking for comments on whether this is a good or bad idea,
> not a detailed description of what all the problems are,
> though the latter may be useful too.
>
> Ralph Boland
>
>
>
>

Bert Freudenberg

Re: multi-language GUI / shells in Smalltalk

In reply to this post by Ralph Boland

On May 3, 2007, at 16:34 , Ralph Boland wrote:

> I am developing a replacement for SmaCC (far from complete).
> Once it is completed I am interested in being able to define languages
> and have code in these languages placed directly in a
> Smalltalk (probably Squeak) image.

There is Marcus' Babel package in Croquet that implements a couple of
different syntaxes. It's Smacc-based.

And very recently Alex and Yoshiki ported the Meta parser from Coke
to Squeak, this is a couple of changesets in the OLPC image.

Might be worth checking out.

- Bert -

Yoshiki Ohshima

Re: multi-language GUI / shells in Smalltalk

> And very recently Alex and Yoshiki ported the Meta parser from Coke
> to Squeak, this is a couple of changesets in the OLPC image.
>
> Might be worth checking out.

Wow, thank you for mentioning it, Bert!

The good thing about META in Squeak is its size (10 classes
including the bootstrap parser), meta circular definition (it is
written in itself), ability to write actions anywhere in production
definitions (actually, there is no real distinction between actions
and lhs syntactical symbols), integration with Squeak browser (you
simply write production rules in a browser), no-real need to write
tokenizer and parser separately, and linear time parsing with
unlimited look ahead and back-tracking.

There is very little pre-defined things. For example, you would
write a tokenizer for parsing a number (an equivalent of

"<number> : [0-9]+ (\. [0-9]*) ? ;"

and "{'1' value asNumber}" in SmaCC) would be written in Meta like a method:

--------------------
number ::=
`0`:r (<char>:c ?`c isDigit` `r * 10 + c digitValue`:r)+ `r`
--------------------

Things enclosed by back-ticks (`) are like things in {} in SmaCC, a
colon (:) followed by a name is like naming in SmaCC ('expression'),
but this is a real assignment. (So, the calculated value in the long
action is re-assigned to the same variable.) Question and back-tick
(?``) is a predicate that tells whether the parsing should continue or
not. A name enclosed by <> is a method call (or production call).
And, the last item in the production is the result value. This is a
tokenizer example, but the parser production is written in the same
way.

We haven't gotten around to port the left-recursion case over to the
Squeak version. Without the left-recursion, it can be painful
sometime. We plan to port that to Meta in Squeak sometime soon.

Ah, yes, the bottom line is that I think it is interesting and worth
to take a look at.

-- Yoshiki

Lukas Renggli

Re: multi-language GUI / shells in Smalltalk

> Ah, yes, the bottom line is that I think it is interesting and worth
> to take a look at.

Cool, but where is it available from?

Lukas

--
Lukas Renggli
http://www.lukas-renggli.ch

Yoshiki Ohshima

Re: multi-language GUI / shells in Smalltalk

Lukas,

> > Ah, yes, the bottom line is that I think it is interesting and worth
> > to take a look at.
>
> Cool, but where is it available from?

Ah sorry. The OLPC images are available at:

http://tinlizzie.org/olpc/

and, the latest for developers is:

http://tinlizzie.org/olpc/etoys-dev-2.0-1315.zip

-- Yoshiki

Karl-19

Re: multi-language GUI / shells in Smalltalk

In reply to this post by Lukas Renggli

Lukas Renggli wrote:
>> Ah, yes, the bottom line is that I think it is interesting and worth
>> to take a look at.
>
> Cool, but where is it available from?
>
> Lukas
>
http://tinlizzie.org/olpc/etoys-dev-2.0-latest.zip

I think you have to update the image.

Karl

stephane ducasse

Re: multi-language GUI / shells in Smalltalk

In reply to this post by Philippe Marschall

what would be nice is to be able to debug squash scripts because this
is always the problems with unix scripts for me. I'm always saying to
myself in Smalltalk I would have put a break put and fix it in 10th
of the time.

Stef

On 3 mai 07, at 23:39, Philippe Marschall wrote:

>> Once the above has been completed I hope to start another project.
>> I want to build a replacement for the Linux shell bash which I
>> call squash.
>>
>> Squash would be a running Squeak image which provides the user
>> with an
>> interface similar to when running the bash shell in Linux.
>> It would also be possible to write squash shell scripts that provided
>> functionality
>> similar to bash shell scripts. Unlike bash shell scripts,
>> however, squash
>> shell
>> scripts would be compiled into files of byte codes called squash
>> executable
>> files.
>> It would also be possible to have other languages whose compilers
>> generate
>> squash executable files and have the squash shell execute them.

stephane ducasse

Re: multi-language GUI / shells in Smalltalk

In reply to this post by Yoshiki Ohshima

Hi Yo

did you publish it on squeaksource or place like that :)

Stef

On 4 mai 07, at 02:21, Yoshiki Ohshima wrote:

>> And very recently Alex and Yoshiki ported the Meta parser from Coke
>> to Squeak, this is a couple of changesets in the OLPC image.
>>
>> Might be worth checking out.
>
> Wow, thank you for mentioning it, Bert!
>
> The good thing about META in Squeak is its size (10 classes
> including the bootstrap parser), meta circular definition (it is
> written in itself), ability to write actions anywhere in production
> definitions (actually, there is no real distinction between actions
> and lhs syntactical symbols), integration with Squeak browser (you
> simply write production rules in a browser), no-real need to write
> tokenizer and parser separately, and linear time parsing with
> unlimited look ahead and back-tracking.
>
> There is very little pre-defined things. For example, you would
> write a tokenizer for parsing a number (an equivalent of
>
> "<number> : [0-9]+ (\. [0-9]*) ? ;"
>
> and "{'1' value asNumber}" in SmaCC) would be written in Meta like
> a method:
>
> --------------------
> number ::=
> `0`:r (<char>:c ?`c isDigit` `r * 10 + c digitValue`:r)+ `r`
> --------------------
>
> Things enclosed by back-ticks (`) are like things in {} in SmaCC, a
> colon (:) followed by a name is like naming in SmaCC ('expression'),
> but this is a real assignment. (So, the calculated value in the long
> action is re-assigned to the same variable.) Question and back-tick
> (?``) is a predicate that tells whether the parsing should continue or
> not. A name enclosed by <> is a method call (or production call).
> And, the last item in the production is the result value. This is a
> tokenizer example, but the parser production is written in the same
> way.
>
> We haven't gotten around to port the left-recursion case over to the
> Squeak version. Without the left-recursion, it can be painful
> sometime. We plan to port that to Meta in Squeak sometime soon.
>
> Ah, yes, the bottom line is that I think it is interesting and worth
> to take a look at.
>
> -- Yoshiki
>
>

Yoshiki Ohshima

Re: multi-language GUI / shells in Smalltalk

Hi Stef,

> did you publish it on squeaksource or place like that :)

Ah, making changesets that bootstrap META properly requires some
work (I did once but merging subsequent patches and do it again sounds
a bit tedious). If I get around to do it, I'd do, but I'm equally
happy if somebody pull that off.

BTW, until yesterday I wasn't aware of that "not-not" is "yes", or I
can synthesize the and-predicate (in parsing expression grammar sense)
by using not-predicate twice. So, I can say that META can express the
popular example of context-sensitive grammar:

a^{n} b^{n} c^{n} : n > 0

with something like:

--------------------
MTokenizer subclass: #MABC
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'Meta-Examples'!

!MABC methodsFor: 'productions' stamp: 'yo 5/4/2007 11:15'!
a ::= $a <a> $b | `true`! !

!MABC methodsFor: 'productions' stamp: 'yo 5/4/2007 11:23'!
aa ::= <a>:a ?`a ~= true`! !

!MABC methodsFor: 'productions' stamp: 'yo 5/4/2007 11:24'!
b ::= $b <b> $c | `true`! !

!MABC methodsFor: 'productions' stamp: 'yo 5/4/2007 11:26'!
bb ::= <b>:b ?`b ~= true`! !

!MABC methodsFor: 'productions' stamp: 'yo 5/4/2007 11:28'!
s ::= ~(~(<aa> ~$b)) $a+ <bb> ~$c! !
--------------------

and I can test that with a do-it:

--------------------
MABC match: 'aaabbbccc' with: #s
--------------------

-- Yoshiki

J J-6

RE: multi-language GUI / shells in Smalltalk

In reply to this post by Ralph Boland

>From: "Ralph Boland" <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: [hidden email]
>Subject: multi-language GUI / shells in Smalltalk
>Date: Thu, 3 May 2007 16:34:18 -0400
>
> b) I am just writing an easier to use, more powerful, and faster SmaCC.
>To generate byte codes from a language specification
>more compiler utilities are needed.
>Has anyone done any work in this area,
>either in Squeak or other versions of Smalltalk?

What style of parser is it? Is it a combinator-style like parsec?

_________________________________________________________________
Need a break? Find your escape route with Live Search Maps.
http://maps.live.com/default.aspx?ss=Restaurants~Hotels~Amusement%20Park&cp=33.832922~-117.915659&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=1118863&encType=1&FORM=MGAC01

Yoshiki Ohshima

Re: multi-language GUI / shells in Smalltalk

In reply to this post by stephane ducasse

Step and everybody,

> did you publish it on squeaksource or place like that :)

I finally uploaded a SAR package (sort of gave up on making
mcz... sorry) and created a SqueakMap entry for the META
implementation in Squeak.

http://map.squeak.org/account/package/e3cdcb13-a408-49c8-9a97-5e9b8befa4ac

This package includes a sample implementation of Squeak Smalltalk-80
parser that can generate a simple intermediate form, and then
ParseNode and then Squeak bytecode. I spent some time to make sure
this generates identical bytecode with the default Squeak compiler...
Now, a test like following:

------------
| m sq o |
o := Morph.
[o selectors do: [:sel |
sq := SqueakParseNodeBuilder new encoder: (Encoder new init: o context: nil notifying: nil).
m := (sq parse: (MSqueakParser2 match: (o sourceCodeAt: sel) string with: #method)) generate: #(0 0 0 0).
(m decompileClass: o selector: sel) printString = (o decompile: sel) printString ifFalse: [self halt].
]] timeToRun
------------

that compiles the source code, decompile the result, and compare it
with the method compiled with the default compiler. For example, all
the method of Morph seem to generate identical bytecode.

My compiler doesn't do any fancy error handling... However, with
the unlimited lookahead feature, it *should* be easier to write
necessary syntax error detection in the grammar itself.

-- Yoshiki

Bert Freudenberg

[ANN] META for Squeak

Just reposting with a proper subject and fixed link ;)

- Bert -

On May 22, 2007, at 23:03 , Yoshiki Ohshima wrote:

> Step and everybody,
>
>> did you publish it on squeaksource or place like that :)
>
> I finally uploaded a SAR package (sort of gave up on making
> mcz... sorry) and created a SqueakMap entry for the META
> implementation in Squeak.
>
> http://map.squeak.org/account/package/e3cdcb13-
> a408-49c8-9a97-5e9b8befa4ac

Actually, http://map.squeak.org/package/e3cdcb13-
a408-49c8-9a97-5e9b8befa4ac

> This package includes a sample implementation of Squeak Smalltalk-80
> parser that can generate a simple intermediate form, and then
> ParseNode and then Squeak bytecode. I spent some time to make sure
> this generates identical bytecode with the default Squeak compiler...
> Now, a test like following:
>
> ------------
> | m sq o |
> o := Morph.
> [o selectors do: [:sel |
> sq := SqueakParseNodeBuilder new encoder: (Encoder new init: o
> context: nil notifying: nil).
> m := (sq parse: (MSqueakParser2 match: (o sourceCodeAt: sel)
> string with: #method)) generate: #(0 0 0 0).
> (m decompileClass: o selector: sel) printString = (o decompile:
> sel) printString ifFalse: [self halt].
> ]] timeToRun
> ------------
>
> that compiles the source code, decompile the result, and compare it
> with the method compiled with the default compiler. For example, all
> the method of Morph seem to generate identical bytecode.
>
> My compiler doesn't do any fancy error handling... However, with
> the unlimited lookahead feature, it *should* be easier to write
> necessary syntax error detection in the grammar itself.
>
> -- Yoshiki
>

Yoshiki Ohshima

Re: [ANN] META for Squeak

> Just reposting with a proper subject and fixed link ;)

Ooh, thanks!

Just to be clear, this is a knock-down version of "real" Meta in
Jolt. Later on, we might try to incorporate new features and
different internal structures from that into this Squeak version.

-- Yoshiki