Understanding the role of the sources file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Understanding the role of the sources file

David Allouche

On 13 Jan 2016, at 16:27, Dimitris Chloupis <[hidden email]> wrote:

"The virtual machine (VM) provides the environment where the Pharo system lives. It is different for each operating system and hardware architecture, and runs as a machine language executable in the operating system. It implements the details of managing memory, executing Pharo byte-code, and communicating with the world outside of the Pharo system: files, other operating system process, and the network."

No the environment is the image, the VM is basically what its names says, a machine emulated by software. The vast majority of tools, even the language itself reside on the image. VM is there in order for the code to be able to execute and to interface with the underlying Operating System. You could completely modify the VM , for example move it to the JVM and still the pharo enviroment would be intact.

That is indeed the idea I tried to convey by "where the Pharo system lives", but I see how that can be misunderstood. However, I try to avoid using "big words" like "abstraction".

How about this?

"The virtual machine (VM) provides the portable environment to execute Pharo images. Its implementation needs to be different for each operating system and hardware architecture, as it runs as a machine language executable in the operating system. It implements the details of managing memory, executing Pharo byte-code, and communicating with the world outside of the Pharo system: files, other operating system process, and the network. 


"Hi Dimitris,
your formulation "...Pharo bytcode...and convert it to machine code..."
is insofar irritating to me as "convert it to machine code" would
suggest to me that a compiler is at work here. Davids "executing Pharo
byte-code" seems more understandable to me here."

Thats correct its a compiler, a byte compiler, it compiles bytecode to machine code and it does it while the code executes, this is why its called JIT , which has the meaning of Just In Time compilation, meaning that machine code is compiled just before the code is executed so several optimizations can be applied that would not be known before the execution of the code. Similar to JAVA's JIT compiler.

Note here that a compiler is not just something that produces machine code, a compiler for example can take one language and compile it to another language.

That's technically true. But most readers will probably unconsciously assume that a compiler is something that produces machine language. The document should be careful to avoid misunderstandings caused by such common assumptions.

As for the JIT, that is totally an implementation detail, and I believe it is only worth mentioning if you want to prevent the reader from assuming that Pharo is slow because its VM executes byte-code. But nowadays, JIT compiling byte-code to machine language are the norm rather than the exception, so I do not think it is worth mentioning.



On Wed, Jan 13, 2016 at 4:58 PM Werner Kassens <[hidden email]> wrote:
Hi Dimitris,
your formulation "...Pharo bytcode...and convert it to machine code..."
is insofar irritating to me as "convert it to machine code" would
suggest to me that a compiler is at work here. Davids "executing Pharo
byte-code" seems more understandable to me here.
werner

On 01/13/2016 02:22 PM, Dimitris Chloupis wrote:
> I assume you have never read a an introduction to C++ then :D
>
> here is the final addition for the vm
>
> (Vm) is the only component that is different for each operating system.
> The main purpose of the VM is to take Pharo bytcode that is generated
> each time user accepts a piece of code and convert it to machine code in
> order to be executed, but also to generally handle low level
> functionality like interpreting code, handling OS events (mouse and
> keyboard), calling C libraries etc. Pharo 4 comes with the Cog VM a very
> fast JIT VM.
>
> I think its clear, precise and does not leave much room for confusion.
> Personally I think its very important for the absolute begineer to have
> strong foundations of understanding the fundamental of Pharo and not for
> things to appear magical and "dont touch this".
>
> On Wed, Jan 13, 2016 at 2:54 PM Sven Van Caekenberghe <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>      > On 13 Jan 2016, at 13:42, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      >
>      > I mentioned bytecode because I dont want the user to see at some
>     point bytecode and say "What the hell is that" I want the reader to
>     feel confident that at least understands the basic in Pharo. Also
>     very brief explanations about bytecode I have seen in similar python
>     tutorials. Obviously I dont want to go any deeper than that because
>     the user wont have to worry about the technical details on a daily
>     basis anyway.
>      >
>      > I agree that I could add a bit more on the VM description similar
>     to what you posted. I am curious though, wont even the interpreter
>     generate machine code in order to execute the code  or does it use
>     existing machine code inside the VM binary ?
>
>     No, a classic interpreter does not 'generate' machine code, it is
>     just a program that reads and executes bytes codes in a loop, the
>     interpreter 'is' machine code.
>
>     No offence, but you see why I think it is important to not try to
>     use or explain too much complex concepts in the 1st chapter.
>
>     Learning to program is hard. It should first be done abstractly.
>     Think about Scratch. The whole idea of Smalltalk is to create a
>     world of interacting objects. (Even byte code is not a necessary
>     concept at all, for example, in Pharo, you can compile (translate)
>     to AST and execute that, I believe. There are Smalltalk
>     implementations that compile directly to C or JavaScript). Hell,
>     even 'compile' is not necessary, just 'accept'. See ?
>
>      > On Wed, Jan 13, 2016 at 2:25 PM Sven Van Caekenberghe
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > Sounds about right.
>      >
>      > Now, I would swap 1 and 4, as the image is the most important
>     abstraction.
>      >
>      > There is also a bit too much emphasis on (byte|source)code. This
>     is already pretty technical (it assume you know what compilation is
>     and so on). But I understand it must be explained here, and you did
>     it well.
>      >
>      > However, I would start by saying that the image is a snapshot of
>     the object world in memory that is effectively a live Pharo system.
>     It contains everything that is available and that exists in Pharo.
>     This includes any objects that you created yourself, windows,
>     browsers, open debuggers, executing processes, all meta objects as
>     well as all representations of code.
>      >
>      > <sidenote>
>      > The fact that there is a sources and changes file is an
>     implementation artefact, not something fundamental. There are ideas
>     to change this in the future (but you do not have to mention that).
>      > </sidenote>
>      >
>      > Also, the VM not only executes code, it maintains the object
>     world, which includes the ability to load and save it from and to an
>     image. It creates a portable (cross platform) abstraction that
>     isolates the image from the particular details of the underlying
>     hardware and OS. In that role it implements the interface with the
>     outside world. I would mention that second part before mentioning
>     the code execution.
>      >
>      > The sentence "The purpose of the VM is to take Pharo bytcode that
>     is generated each time user accepts a piece of code and convert it
>     to machine code in order to be executed." is not 100% correct. It is
>     possible to execute the byte code without converting it. This is
>     called interpretation. JIT is a faster technique that includes
>     converting (some often used) byte code to machine code and caching that.
>      >
>      > I hope this helps (it is hard to write a 'definitive explanation'
>     as there are some many aspects to this and it depends on the
>     context/audience).
>      >
>      > > On 13 Jan 2016, at 12:58, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > >
>      > > So I am correct that the image does not store the source code,
>     and that the source code is stored in sources and changes. The only
>     diffirence is that the objects have a source variable that points to
>     the right place for finding the source code.
>      > >
>      > > This is the final text if you find anything incorrect please
>     correct me
>      > >
>      > > ---------------
>      > >
>      > > 1. The virtual machine (VM) is the only component that is
>     different for each operating system. The purpose of the VM is to
>     take Pharo bytcode that is generated each time user accepts a piece
>     of code and convert it to machine code in order to be executed.
>     Pharo 4 comes with the Cog VM a very fast JIT VM. The VM executable
>     is named:
>      > >
>      > > • Pharo.exe for Windows; • pharo for Linux ; and
>      > >
>      > > • Pharo for OSX (inside a package also named Pharo.app).
>      > > The other components below are portable across operating
>     systems, and
>      > >
>      > > can be copied and run on any appropriate virtual machine.
>      > >
>      > > 2. The sources file contains source code for parts of Pharo
>     that don’t change frequently. Sources file is important because the
>     image file format stores only the bytecode of live objects and not
>     their source code. Typically a new sources file is generated once
>     per major release of Pharo. For Pharo 4.0, this file is named
>     PharoV40.sources.
>      > >
>      > > 3. The changes file logs of all source code modifications since
>     the .sources file was generated. This facilitates a per method
>     history for diffs or re- verting.That means that even if you dont
>     manage to save the image file on a crash or you just forgot you can
>     recover your changes from this file. Each release provides a near
>     empty file named for the release, for example Pharo4.0.changes.
>      > >
>      > > 4. The image file provides a frozen in time snapshot of a
>     running Pharo system. This is the file where the Pharo bytecode is
>     stored and as such its a cross platform format. This is the heart of
>     Pharo, containing the live state of all objects in the system
>     (including classes and methods, since they are objects too). The
>     file is named for the release (like Pharo4.0.image).
>      > >
>      > > The .image and .changes files provided by a Pharo release are
>     the starting point for a live environment that you adapt to your
>     needs. Essentially the image file containes the compiler of the
>     language (not the VM) , the language parser, the IDE tools, many
>     libraries and acts a bit like a virtual Operation System that runs
>     on top of a Virtual Machine (VM), similarly to ISO files.
>      > >
>      > > As you work in Pharo, these files are modified, so you need to
>     make sure that they are writable. The .image and .changes files are
>     intimately linked and should always be kept together, with matching
>     base filenames. Never edit them directly with a text editor, as
>     .images holds your live object runtime memory, which indexes into
>     the .changes files for the source. It is a good idea to keep a
>     backup copy of the downloaded .image and .changes files so you can
>     always start from a fresh image and reload your code. However the
>     most efficient way for backing up code is to use a version control
>     system that will provide an easier and powerful way to back up and
>     track your changes.
>      > >
>      > > The four main component files above can be placed in the same
>     directory, although it’s also possible to put the Virtual Machine
>     and sources file in a separate directory where everyone has
>     read-only access to them.
>      > >
>      > > If more than one image file is present in the same directory
>     pharo will prompt you to choose an image file you want to load.
>      > >
>      > > Do whatever works best for your style of working and your
>     operating system.
>      > >
>      > >
>      > >
>      > >
>      > >
>      > > On Wed, Jan 13, 2016 at 12:13 PM Sven Van Caekenberghe
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > >
>      > > > On 13 Jan 2016, at 10:57, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > > >
>      > > > I was adding a short description to the UPBE about sources
>     file , I always thought that the sources file is the file that
>     contains the source code of the image because the image file itself
>     stores only the bytecode.
>      > > >
>      > > > However its just came to my attention that the sources file
>     does not contain code that is recently installed in the image.
>      > > >
>      > > > So how exactly the sources file works and what it is ?
>      > >
>      > > The main perspective is from the object point of view: methods
>     are just objects like everything else. In order to be executable
>     they know their byte codes (which might be JIT compiled on
>     execution, but that is an implementation detail) and they know their
>     source code.
>      > >
>      > > Today we would probably just store the source code strings in
>     the image (maybe compressed) as memory is pretty cheap. But way back
>     when Smalltalk started, that was not the case. So they decided to
>     map the source code out to files.
>      > >
>      > > So method source code is a magic string (RemoteString) that
>     points to some position in a file. There are 2 files in use: the
>     sources file and the changes file.
>      > >
>      > > The sources file is a kind of snapshot of the source code of
>     all methods at the point of release of a major new version. That is
>     why there is a Vxy in their name. The source file never changes once
>     created or renewed (a process called generating the sources, see
>     PharoSourcesCondenser).
>      > >
>      > > While developing and creating new versions of methods, the new
>     source code is appended to another file called the changes file,
>     much like a transaction log. This is also a safety mechanism to
>     recover 'lost' changes.
>      > >
>      > > The changes file can contain multiple versions of a method.
>     This can be reduced in size using a process called condensing the
>     changes, see PharoChangesCondenser.
>      > >
>      > > On a new release, the changes file will be (almost) empty.
>      > >
>      > > HTH,
>      > >
>      > > Sven
>      > >
>      > >
>      > >
>      >
>      >
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Understanding the role of the sources file

kilon.alios
In reply to this post by Ben Coman
It does not make as big difference as you think it does. First of all VM is an executable binary so its already in machine code format or else it would not run. The difference between the interpreter is that with the interpreter you already execute machine code without any need to generate it while JIT generates optimized machine code depending on the usage and case so it runs much faster than the machine code used by the interpreter . 

The bottom line is the same, machine code execution in both cases. Also from the link you provided its clear that the only way that codel wont be compiled to machine code is code that rarely used, code that you execute once and you forget about which you would not need to run fast anyway.  In that case yes, the interpreter will be used. However there a ton of code that needs to run in a reparation of some sort  , whether is drawing the GUI, examining an even, parsing a file , in most cases you will be using some kind of loop and in that case JIT will come to compile to machine code and you will see a very significant boost in speed depending on the case.

But all that are implementations details which I dont want people to deal with, I only want the reader to have a very general idea what the VM and why Pharo needs it, I dont care about the specifics.

On Wed, Jan 13, 2016 at 6:26 PM Ben Coman <[hidden email]> wrote:
> On Wed, Jan 13, 2016 at 4:58 PM Werner Kassens <[hidden email]> wrote:
>>
>> Hi Dimitris,
>> your formulation "...Pharo bytcode...and convert it to machine code..."
>> is insofar irritating to me as "convert it to machine code" would
>> suggest to me that a compiler is at work here. Davids "executing Pharo
>> byte-code" seems more understandable to me here.

On Wed, Jan 13, 2016 at 11:27 PM, Dimitris Chloupis
<[hidden email]> wrote:
> Thats correct its a compiler, a byte compiler, it compiles bytecode to
> machine code and it does it while the code executes, this is why its called
> JIT , which has the meaning of Just In Time compilation, meaning that
> machine code is compiled just before the code is executed so several
> optimizations can be applied that would not be known before the execution of
> the code. Similar to JAVA's JIT compiler.
>
> Note here that a compiler is not just something that produces machine code,
> a compiler for example can take one language and compile it to another
> language.

Indeed.  The OpalCompiler takes Smalltalk and produces bytecode.

However I think Sven and Werner were referring that much of Pharo code
is not JITd, but *merely* interpreted (IIUC).  See section "Not so
smart questions" here...
https://clementbera.wordpress.com/2014/01/09/the-sista-chronicles-i-an-introduction-to-adaptive-recompilation/

cheers -ben

>>
>> On 01/13/2016 02:22 PM, Dimitris Chloupis wrote:
>> > I assume you have never read a an introduction to C++ then :D
>> >
>> > here is the final addition for the vm
>> >
>> > (Vm) is the only component that is different for each operating system.
>> > The main purpose of the VM is to take Pharo bytcode that is generated
>> > each time user accepts a piece of code and convert it to machine code in
>> > order to be executed, but also to generally handle low level
>> > functionality like interpreting code, handling OS events (mouse and
>> > keyboard), calling C libraries etc. Pharo 4 comes with the Cog VM a very
>> > fast JIT VM.
>> >
>> > I think its clear, precise and does not leave much room for confusion.
>> > Personally I think its very important for the absolute begineer to have
>> > strong foundations of understanding the fundamental of Pharo and not for
>> > things to appear magical and "dont touch this".
>> >
>> > On Wed, Jan 13, 2016 at 2:54 PM Sven Van Caekenberghe <[hidden email]
>> > <mailto:[hidden email]>> wrote:
>> >
>> >
>> >      > On 13 Jan 2016, at 13:42, Dimitris Chloupis
>> >     <[hidden email] <mailto:[hidden email]>> wrote:
>> >      >
>> >      > I mentioned bytecode because I dont want the user to see at some
>> >     point bytecode and say "What the hell is that" I want the reader to
>> >     feel confident that at least understands the basic in Pharo. Also
>> >     very brief explanations about bytecode I have seen in similar python
>> >     tutorials. Obviously I dont want to go any deeper than that because
>> >     the user wont have to worry about the technical details on a daily
>> >     basis anyway.
>> >      >
>> >      > I agree that I could add a bit more on the VM description similar
>> >     to what you posted. I am curious though, wont even the interpreter
>> >     generate machine code in order to execute the code  or does it use
>> >     existing machine code inside the VM binary ?
>> >
>> >     No, a classic interpreter does not 'generate' machine code, it is
>> >     just a program that reads and executes bytes codes in a loop, the
>> >     interpreter 'is' machine code.
>> >
>> >     No offence, but you see why I think it is important to not try to
>> >     use or explain too much complex concepts in the 1st chapter.
>> >
>> >     Learning to program is hard. It should first be done abstractly.
>> >     Think about Scratch. The whole idea of Smalltalk is to create a
>> >     world of interacting objects. (Even byte code is not a necessary
>> >     concept at all, for example, in Pharo, you can compile (translate)
>> >     to AST and execute that, I believe. There are Smalltalk
>> >     implementations that compile directly to C or JavaScript). Hell,
>> >     even 'compile' is not necessary, just 'accept'. See ?
>> >
>> >      > On Wed, Jan 13, 2016 at 2:25 PM Sven Van Caekenberghe
>> >     <[hidden email] <mailto:[hidden email]>> wrote:
>> >      > Sounds about right.
>> >      >
>> >      > Now, I would swap 1 and 4, as the image is the most important
>> >     abstraction.
>> >      >
>> >      > There is also a bit too much emphasis on (byte|source)code. This
>> >     is already pretty technical (it assume you know what compilation is
>> >     and so on). But I understand it must be explained here, and you did
>> >     it well.
>> >      >
>> >      > However, I would start by saying that the image is a snapshot of
>> >     the object world in memory that is effectively a live Pharo system.
>> >     It contains everything that is available and that exists in Pharo.
>> >     This includes any objects that you created yourself, windows,
>> >     browsers, open debuggers, executing processes, all meta objects as
>> >     well as all representations of code.
>> >      >
>> >      > <sidenote>
>> >      > The fact that there is a sources and changes file is an
>> >     implementation artefact, not something fundamental. There are ideas
>> >     to change this in the future (but you do not have to mention that).
>> >      > </sidenote>
>> >      >
>> >      > Also, the VM not only executes code, it maintains the object
>> >     world, which includes the ability to load and save it from and to an
>> >     image. It creates a portable (cross platform) abstraction that
>> >     isolates the image from the particular details of the underlying
>> >     hardware and OS. In that role it implements the interface with the
>> >     outside world. I would mention that second part before mentioning
>> >     the code execution.
>> >      >
>> >      > The sentence "The purpose of the VM is to take Pharo bytcode that
>> >     is generated each time user accepts a piece of code and convert it
>> >     to machine code in order to be executed." is not 100% correct. It is
>> >     possible to execute the byte code without converting it. This is
>> >     called interpretation. JIT is a faster technique that includes
>> >     converting (some often used) byte code to machine code and caching
>> > that.
>> >      >
>> >      > I hope this helps (it is hard to write a 'definitive explanation'
>> >     as there are some many aspects to this and it depends on the
>> >     context/audience).
>> >      >
>> >      > > On 13 Jan 2016, at 12:58, Dimitris Chloupis
>> >     <[hidden email] <mailto:[hidden email]>> wrote:
>> >      > >
>> >      > > So I am correct that the image does not store the source code,
>> >     and that the source code is stored in sources and changes. The only
>> >     diffirence is that the objects have a source variable that points to
>> >     the right place for finding the source code.
>> >      > >
>> >      > > This is the final text if you find anything incorrect please
>> >     correct me
>> >      > >
>> >      > > ---------------
>> >      > >
>> >      > > 1. The virtual machine (VM) is the only component that is
>> >     different for each operating system. The purpose of the VM is to
>> >     take Pharo bytcode that is generated each time user accepts a piece
>> >     of code and convert it to machine code in order to be executed.
>> >     Pharo 4 comes with the Cog VM a very fast JIT VM. The VM executable
>> >     is named:
>> >      > >
>> >      > > • Pharo.exe for Windows; • pharo for Linux ; and
>> >      > >
>> >      > > • Pharo for OSX (inside a package also named Pharo.app).
>> >      > > The other components below are portable across operating
>> >     systems, and
>> >      > >
>> >      > > can be copied and run on any appropriate virtual machine.
>> >      > >
>> >      > > 2. The sources file contains source code for parts of Pharo
>> >     that don’t change frequently. Sources file is important because the
>> >     image file format stores only the bytecode of live objects and not
>> >     their source code. Typically a new sources file is generated once
>> >     per major release of Pharo. For Pharo 4.0, this file is named
>> >     PharoV40.sources.
>> >      > >
>> >      > > 3. The changes file logs of all source code modifications since
>> >     the .sources file was generated. This facilitates a per method
>> >     history for diffs or re- verting.That means that even if you dont
>> >     manage to save the image file on a crash or you just forgot you can
>> >     recover your changes from this file. Each release provides a near
>> >     empty file named for the release, for example Pharo4.0.changes.
>> >      > >
>> >      > > 4. The image file provides a frozen in time snapshot of a
>> >     running Pharo system. This is the file where the Pharo bytecode is
>> >     stored and as such its a cross platform format. This is the heart of
>> >     Pharo, containing the live state of all objects in the system
>> >     (including classes and methods, since they are objects too). The
>> >     file is named for the release (like Pharo4.0.image).
>> >      > >
>> >      > > The .image and .changes files provided by a Pharo release are
>> >     the starting point for a live environment that you adapt to your
>> >     needs. Essentially the image file containes the compiler of the
>> >     language (not the VM) , the language parser, the IDE tools, many
>> >     libraries and acts a bit like a virtual Operation System that runs
>> >     on top of a Virtual Machine (VM), similarly to ISO files.
>> >      > >
>> >      > > As you work in Pharo, these files are modified, so you need to
>> >     make sure that they are writable. The .image and .changes files are
>> >     intimately linked and should always be kept together, with matching
>> >     base filenames. Never edit them directly with a text editor, as
>> >     .images holds your live object runtime memory, which indexes into
>> >     the .changes files for the source. It is a good idea to keep a
>> >     backup copy of the downloaded .image and .changes files so you can
>> >     always start from a fresh image and reload your code. However the
>> >     most efficient way for backing up code is to use a version control
>> >     system that will provide an easier and powerful way to back up and
>> >     track your changes.
>> >      > >
>> >      > > The four main component files above can be placed in the same
>> >     directory, although it’s also possible to put the Virtual Machine
>> >     and sources file in a separate directory where everyone has
>> >     read-only access to them.
>> >      > >
>> >      > > If more than one image file is present in the same directory
>> >     pharo will prompt you to choose an image file you want to load.
>> >      > >
>> >      > > Do whatever works best for your style of working and your
>> >     operating system.
>> >      > >
>> >      > >
>> >      > >
>> >      > >
>> >      > >
>> >      > > On Wed, Jan 13, 2016 at 12:13 PM Sven Van Caekenberghe
>> >     <[hidden email] <mailto:[hidden email]>> wrote:
>> >      > >
>> >      > > > On 13 Jan 2016, at 10:57, Dimitris Chloupis
>> >     <[hidden email] <mailto:[hidden email]>> wrote:
>> >      > > >
>> >      > > > I was adding a short description to the UPBE about sources
>> >     file , I always thought that the sources file is the file that
>> >     contains the source code of the image because the image file itself
>> >     stores only the bytecode.
>> >      > > >
>> >      > > > However its just came to my attention that the sources file
>> >     does not contain code that is recently installed in the image.
>> >      > > >
>> >      > > > So how exactly the sources file works and what it is ?
>> >      > >
>> >      > > The main perspective is from the object point of view: methods
>> >     are just objects like everything else. In order to be executable
>> >     they know their byte codes (which might be JIT compiled on
>> >     execution, but that is an implementation detail) and they know their
>> >     source code.
>> >      > >
>> >      > > Today we would probably just store the source code strings in
>> >     the image (maybe compressed) as memory is pretty cheap. But way back
>> >     when Smalltalk started, that was not the case. So they decided to
>> >     map the source code out to files.
>> >      > >
>> >      > > So method source code is a magic string (RemoteString) that
>> >     points to some position in a file. There are 2 files in use: the
>> >     sources file and the changes file.
>> >      > >
>> >      > > The sources file is a kind of snapshot of the source code of
>> >     all methods at the point of release of a major new version. That is
>> >     why there is a Vxy in their name. The source file never changes once
>> >     created or renewed (a process called generating the sources, see
>> >     PharoSourcesCondenser).
>> >      > >
>> >      > > While developing and creating new versions of methods, the new
>> >     source code is appended to another file called the changes file,
>> >     much like a transaction log. This is also a safety mechanism to
>> >     recover 'lost' changes.
>> >      > >
>> >      > > The changes file can contain multiple versions of a method.
>> >     This can be reduced in size using a process called condensing the
>> >     changes, see PharoChangesCondenser.
>> >      > >
>> >      > > On a new release, the changes file will be (almost) empty.
>> >      > >
>> >      > > HTH,
>> >      > >
>> >      > > Sven
>> >      > >
>> >      > >
>> >      > >
>> >      >
>> >      >
>> >
>> >
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Understanding the role of the sources file

kilon.alios
In reply to this post by David Allouche
Why the Cog VM is mentioned in my documentation is no accident. There is another VM that is called Stackless VM or old VM that was the VM before Cog that had no JIT and its a VM we still use for platforms that Cog VM does not currently run like Raspberry Pi, Android and I think iOS too.


"But nowadays, JIT compiling byte-code to machine language are the norm rather than the exception, so I do not think it is worth mentioning."

Definitely not the norm Python and Ruby VMs are not JIT, generally speaking JIT compilers are rare for Dynamic languages like Pharo. Python has PyPy which is a JIT VM but is nowhere as popular as Cpython.

CPython people try to avoid JIT because it makes VM architecture much more complex. At least thats their argument.

Another big advantage of Pharo is that it has luxury to come with a JIT included and well tested.

The only popular JITs out there are Java's and .NET, dont know if Javascript V8 use JIT as well, but thats pretty much it. But bytecode is very popular indeed.
On Wed, Jan 13, 2016 at 6:43 PM David Allouche <[hidden email]> wrote:
On 13 Jan 2016, at 16:27, Dimitris Chloupis <[hidden email]> wrote:

"The virtual machine (VM) provides the environment where the Pharo system lives. It is different for each operating system and hardware architecture, and runs as a machine language executable in the operating system. It implements the details of managing memory, executing Pharo byte-code, and communicating with the world outside of the Pharo system: files, other operating system process, and the network."

No the environment is the image, the VM is basically what its names says, a machine emulated by software. The vast majority of tools, even the language itself reside on the image. VM is there in order for the code to be able to execute and to interface with the underlying Operating System. You could completely modify the VM , for example move it to the JVM and still the pharo enviroment would be intact.

That is indeed the idea I tried to convey by "where the Pharo system lives", but I see how that can be misunderstood. However, I try to avoid using "big words" like "abstraction".

How about this?

"The virtual machine (VM) provides the portable environment to execute Pharo images. Its implementation needs to be different for each operating system and hardware architecture, as it runs as a machine language executable in the operating system. It implements the details of managing memory, executing Pharo byte-code, and communicating with the world outside of the Pharo system: files, other operating system process, and the network. 


"Hi Dimitris,
your formulation "...Pharo bytcode...and convert it to machine code..."
is insofar irritating to me as "convert it to machine code" would
suggest to me that a compiler is at work here. Davids "executing Pharo
byte-code" seems more understandable to me here."

Thats correct its a compiler, a byte compiler, it compiles bytecode to machine code and it does it while the code executes, this is why its called JIT , which has the meaning of Just In Time compilation, meaning that machine code is compiled just before the code is executed so several optimizations can be applied that would not be known before the execution of the code. Similar to JAVA's JIT compiler.

Note here that a compiler is not just something that produces machine code, a compiler for example can take one language and compile it to another language.

That's technically true. But most readers will probably unconsciously assume that a compiler is something that produces machine language. The document should be careful to avoid misunderstandings caused by such common assumptions.

As for the JIT, that is totally an implementation detail, and I believe it is only worth mentioning if you want to prevent the reader from assuming that Pharo is slow because its VM executes byte-code. But nowadays, JIT compiling byte-code to machine language are the norm rather than the exception, so I do not think it is worth mentioning.



On Wed, Jan 13, 2016 at 4:58 PM Werner Kassens <[hidden email]> wrote:
Hi Dimitris,
your formulation "...Pharo bytcode...and convert it to machine code..."
is insofar irritating to me as "convert it to machine code" would
suggest to me that a compiler is at work here. Davids "executing Pharo
byte-code" seems more understandable to me here.
werner

On 01/13/2016 02:22 PM, Dimitris Chloupis wrote:
> I assume you have never read a an introduction to C++ then :D
>
> here is the final addition for the vm
>
> (Vm) is the only component that is different for each operating system.
> The main purpose of the VM is to take Pharo bytcode that is generated
> each time user accepts a piece of code and convert it to machine code in
> order to be executed, but also to generally handle low level
> functionality like interpreting code, handling OS events (mouse and
> keyboard), calling C libraries etc. Pharo 4 comes with the Cog VM a very
> fast JIT VM.
>
> I think its clear, precise and does not leave much room for confusion.
> Personally I think its very important for the absolute begineer to have
> strong foundations of understanding the fundamental of Pharo and not for
> things to appear magical and "dont touch this".
>
> On Wed, Jan 13, 2016 at 2:54 PM Sven Van Caekenberghe <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>      > On 13 Jan 2016, at 13:42, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      >
>      > I mentioned bytecode because I dont want the user to see at some
>     point bytecode and say "What the hell is that" I want the reader to
>     feel confident that at least understands the basic in Pharo. Also
>     very brief explanations about bytecode I have seen in similar python
>     tutorials. Obviously I dont want to go any deeper than that because
>     the user wont have to worry about the technical details on a daily
>     basis anyway.
>      >
>      > I agree that I could add a bit more on the VM description similar
>     to what you posted. I am curious though, wont even the interpreter
>     generate machine code in order to execute the code  or does it use
>     existing machine code inside the VM binary ?
>
>     No, a classic interpreter does not 'generate' machine code, it is
>     just a program that reads and executes bytes codes in a loop, the
>     interpreter 'is' machine code.
>
>     No offence, but you see why I think it is important to not try to
>     use or explain too much complex concepts in the 1st chapter.
>
>     Learning to program is hard. It should first be done abstractly.
>     Think about Scratch. The whole idea of Smalltalk is to create a
>     world of interacting objects. (Even byte code is not a necessary
>     concept at all, for example, in Pharo, you can compile (translate)
>     to AST and execute that, I believe. There are Smalltalk
>     implementations that compile directly to C or JavaScript). Hell,
>     even 'compile' is not necessary, just 'accept'. See ?
>
>      > On Wed, Jan 13, 2016 at 2:25 PM Sven Van Caekenberghe
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > Sounds about right.
>      >
>      > Now, I would swap 1 and 4, as the image is the most important
>     abstraction.
>      >
>      > There is also a bit too much emphasis on (byte|source)code. This
>     is already pretty technical (it assume you know what compilation is
>     and so on). But I understand it must be explained here, and you did
>     it well.
>      >
>      > However, I would start by saying that the image is a snapshot of
>     the object world in memory that is effectively a live Pharo system.
>     It contains everything that is available and that exists in Pharo.
>     This includes any objects that you created yourself, windows,
>     browsers, open debuggers, executing processes, all meta objects as
>     well as all representations of code.
>      >
>      > <sidenote>
>      > The fact that there is a sources and changes file is an
>     implementation artefact, not something fundamental. There are ideas
>     to change this in the future (but you do not have to mention that).
>      > </sidenote>
>      >
>      > Also, the VM not only executes code, it maintains the object
>     world, which includes the ability to load and save it from and to an
>     image. It creates a portable (cross platform) abstraction that
>     isolates the image from the particular details of the underlying
>     hardware and OS. In that role it implements the interface with the
>     outside world. I would mention that second part before mentioning
>     the code execution.
>      >
>      > The sentence "The purpose of the VM is to take Pharo bytcode that
>     is generated each time user accepts a piece of code and convert it
>     to machine code in order to be executed." is not 100% correct. It is
>     possible to execute the byte code without converting it. This is
>     called interpretation. JIT is a faster technique that includes
>     converting (some often used) byte code to machine code and caching that.
>      >
>      > I hope this helps (it is hard to write a 'definitive explanation'
>     as there are some many aspects to this and it depends on the
>     context/audience).
>      >
>      > > On 13 Jan 2016, at 12:58, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > >
>      > > So I am correct that the image does not store the source code,
>     and that the source code is stored in sources and changes. The only
>     diffirence is that the objects have a source variable that points to
>     the right place for finding the source code.
>      > >
>      > > This is the final text if you find anything incorrect please
>     correct me
>      > >
>      > > ---------------
>      > >
>      > > 1. The virtual machine (VM) is the only component that is
>     different for each operating system. The purpose of the VM is to
>     take Pharo bytcode that is generated each time user accepts a piece
>     of code and convert it to machine code in order to be executed.
>     Pharo 4 comes with the Cog VM a very fast JIT VM. The VM executable
>     is named:
>      > >
>      > > • Pharo.exe for Windows; • pharo for Linux ; and
>      > >
>      > > • Pharo for OSX (inside a package also named Pharo.app).
>      > > The other components below are portable across operating
>     systems, and
>      > >
>      > > can be copied and run on any appropriate virtual machine.
>      > >
>      > > 2. The sources file contains source code for parts of Pharo
>     that don’t change frequently. Sources file is important because the
>     image file format stores only the bytecode of live objects and not
>     their source code. Typically a new sources file is generated once
>     per major release of Pharo. For Pharo 4.0, this file is named
>     PharoV40.sources.
>      > >
>      > > 3. The changes file logs of all source code modifications since
>     the .sources file was generated. This facilitates a per method
>     history for diffs or re- verting.That means that even if you dont
>     manage to save the image file on a crash or you just forgot you can
>     recover your changes from this file. Each release provides a near
>     empty file named for the release, for example Pharo4.0.changes.
>      > >
>      > > 4. The image file provides a frozen in time snapshot of a
>     running Pharo system. This is the file where the Pharo bytecode is
>     stored and as such its a cross platform format. This is the heart of
>     Pharo, containing the live state of all objects in the system
>     (including classes and methods, since they are objects too). The
>     file is named for the release (like Pharo4.0.image).
>      > >
>      > > The .image and .changes files provided by a Pharo release are
>     the starting point for a live environment that you adapt to your
>     needs. Essentially the image file containes the compiler of the
>     language (not the VM) , the language parser, the IDE tools, many
>     libraries and acts a bit like a virtual Operation System that runs
>     on top of a Virtual Machine (VM), similarly to ISO files.
>      > >
>      > > As you work in Pharo, these files are modified, so you need to
>     make sure that they are writable. The .image and .changes files are
>     intimately linked and should always be kept together, with matching
>     base filenames. Never edit them directly with a text editor, as
>     .images holds your live object runtime memory, which indexes into
>     the .changes files for the source. It is a good idea to keep a
>     backup copy of the downloaded .image and .changes files so you can
>     always start from a fresh image and reload your code. However the
>     most efficient way for backing up code is to use a version control
>     system that will provide an easier and powerful way to back up and
>     track your changes.
>      > >
>      > > The four main component files above can be placed in the same
>     directory, although it’s also possible to put the Virtual Machine
>     and sources file in a separate directory where everyone has
>     read-only access to them.
>      > >
>      > > If more than one image file is present in the same directory
>     pharo will prompt you to choose an image file you want to load.
>      > >
>      > > Do whatever works best for your style of working and your
>     operating system.
>      > >
>      > >
>      > >
>      > >
>      > >
>      > > On Wed, Jan 13, 2016 at 12:13 PM Sven Van Caekenberghe
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > >
>      > > > On 13 Jan 2016, at 10:57, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > > >
>      > > > I was adding a short description to the UPBE about sources
>     file , I always thought that the sources file is the file that
>     contains the source code of the image because the image file itself
>     stores only the bytecode.
>      > > >
>      > > > However its just came to my attention that the sources file
>     does not contain code that is recently installed in the image.
>      > > >
>      > > > So how exactly the sources file works and what it is ?
>      > >
>      > > The main perspective is from the object point of view: methods
>     are just objects like everything else. In order to be executable
>     they know their byte codes (which might be JIT compiled on
>     execution, but that is an implementation detail) and they know their
>     source code.
>      > >
>      > > Today we would probably just store the source code strings in
>     the image (maybe compressed) as memory is pretty cheap. But way back
>     when Smalltalk started, that was not the case. So they decided to
>     map the source code out to files.
>      > >
>      > > So method source code is a magic string (RemoteString) that
>     points to some position in a file. There are 2 files in use: the
>     sources file and the changes file.
>      > >
>      > > The sources file is a kind of snapshot of the source code of
>     all methods at the point of release of a major new version. That is
>     why there is a Vxy in their name. The source file never changes once
>     created or renewed (a process called generating the sources, see
>     PharoSourcesCondenser).
>      > >
>      > > While developing and creating new versions of methods, the new
>     source code is appended to another file called the changes file,
>     much like a transaction log. This is also a safety mechanism to
>     recover 'lost' changes.
>      > >
>      > > The changes file can contain multiple versions of a method.
>     This can be reduced in size using a process called condensing the
>     changes, see PharoChangesCondenser.
>      > >
>      > > On a new release, the changes file will be (almost) empty.
>      > >
>      > > HTH,
>      > >
>      > > Sven
>      > >
>      > >
>      > >
>      >
>      >
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Understanding the role of the sources file

David Allouche

On 13 Jan 2016, at 18:14, Dimitris Chloupis <[hidden email]> wrote:

Why the Cog VM is mentioned in my documentation is no accident. There is another VM that is called Stackless VM or old VM that was the VM before Cog that had no JIT and its a VM we still use for platforms that Cog VM does not currently run like Raspberry Pi, Android and I think iOS too.


"But nowadays, JIT compiling byte-code to machine language are the norm rather than the exception, so I do not think it is worth mentioning."

Definitely not the norm Python and Ruby VMs are not JIT, generally speaking JIT compilers are rare for Dynamic languages like Pharo. Python has PyPy which is a JIT VM but is nowhere as popular as Cpython.

You are right. The JIT is worth mentioning there.


CPython people try to avoid JIT because it makes VM architecture much more complex. At least thats their argument.

I am getting off-topic, but since I have already written what follows, and I think it is informative, I am leaving it there :-)

Python is a rather large language, with a large number of quirks that need to be preserved to provide full compatibility. It is also a language with advanced introspection, and very dynamic semantics, even more than Smalltalk in some aspects. And its main implementation (CPython) has very highly optimised data structures and memory management.

That makes it really difficult to implement correctly and and to improve performance without slowing down other parts. Before PyPy, Google funded the Unladen Swallow project, that tried to produce a Python JIT using LLVM, this project was abandoned. Even PyPy needs to make frequent and expensive guard tests to ensure that invariants required for JIT are preserved. It does bring a significant speed boost (2x-10x), but at the expense of slow startup and increased memory footprint (~2x).

However, it seems clear that the long term future of Python lies in PyPy. In particular, they have high hopes to solve the lack of multicore support by implementing software transactional memory in the medium term, until hardware transactional memory is advanced enough to be usable there.

Another big advantage of Pharo is that it has luxury to come with a JIT included and well tested.

I guess the simplicity of the language makes it relatively easy to produce a JIT. This is indeed a very good thing.


The only popular JITs out there are Java's and .NET, dont know if Javascript V8 use JIT as well, but thats pretty much it. But bytecode is very popular indeed.

V8 does use JIT. I believe it was the first widely deployed language implementation to use a tracing interpreter to JIT a language with latent typing. I believe all the other major Javascript implementations have followed suite.

This is also the approach used by PyPy. Java and .Net have static typing, so they do not require a tracing interpreter to compile to machine code.

On Wed, Jan 13, 2016 at 6:43 PM David Allouche <[hidden email]> wrote:
On 13 Jan 2016, at 16:27, Dimitris Chloupis <[hidden email]> wrote:

"The virtual machine (VM) provides the environment where the Pharo system lives. It is different for each operating system and hardware architecture, and runs as a machine language executable in the operating system. It implements the details of managing memory, executing Pharo byte-code, and communicating with the world outside of the Pharo system: files, other operating system process, and the network."

No the environment is the image, the VM is basically what its names says, a machine emulated by software. The vast majority of tools, even the language itself reside on the image. VM is there in order for the code to be able to execute and to interface with the underlying Operating System. You could completely modify the VM , for example move it to the JVM and still the pharo enviroment would be intact.

That is indeed the idea I tried to convey by "where the Pharo system lives", but I see how that can be misunderstood. However, I try to avoid using "big words" like "abstraction".

How about this?

"The virtual machine (VM) provides the portable environment to execute Pharo images. Its implementation needs to be different for each operating system and hardware architecture, as it runs as a machine language executable in the operating system. It implements the details of managing memory, executing Pharo byte-code, and communicating with the world outside of the Pharo system: files, other operating system process, and the network. 


"Hi Dimitris,
your formulation "...Pharo bytcode...and convert it to machine code..."
is insofar irritating to me as "convert it to machine code" would
suggest to me that a compiler is at work here. Davids "executing Pharo
byte-code" seems more understandable to me here."

Thats correct its a compiler, a byte compiler, it compiles bytecode to machine code and it does it while the code executes, this is why its called JIT , which has the meaning of Just In Time compilation, meaning that machine code is compiled just before the code is executed so several optimizations can be applied that would not be known before the execution of the code. Similar to JAVA's JIT compiler.

Note here that a compiler is not just something that produces machine code, a compiler for example can take one language and compile it to another language.

That's technically true. But most readers will probably unconsciously assume that a compiler is something that produces machine language. The document should be careful to avoid misunderstandings caused by such common assumptions.

As for the JIT, that is totally an implementation detail, and I believe it is only worth mentioning if you want to prevent the reader from assuming that Pharo is slow because its VM executes byte-code. But nowadays, JIT compiling byte-code to machine language are the norm rather than the exception, so I do not think it is worth mentioning.



On Wed, Jan 13, 2016 at 4:58 PM Werner Kassens <[hidden email]> wrote:
Hi Dimitris,
your formulation "...Pharo bytcode...and convert it to machine code..."
is insofar irritating to me as "convert it to machine code" would
suggest to me that a compiler is at work here. Davids "executing Pharo
byte-code" seems more understandable to me here.
werner

On 01/13/2016 02:22 PM, Dimitris Chloupis wrote:
> I assume you have never read a an introduction to C++ then :D
>
> here is the final addition for the vm
>
> (Vm) is the only component that is different for each operating system.
> The main purpose of the VM is to take Pharo bytcode that is generated
> each time user accepts a piece of code and convert it to machine code in
> order to be executed, but also to generally handle low level
> functionality like interpreting code, handling OS events (mouse and
> keyboard), calling C libraries etc. Pharo 4 comes with the Cog VM a very
> fast JIT VM.
>
> I think its clear, precise and does not leave much room for confusion.
> Personally I think its very important for the absolute begineer to have
> strong foundations of understanding the fundamental of Pharo and not for
> things to appear magical and "dont touch this".
>
> On Wed, Jan 13, 2016 at 2:54 PM Sven Van Caekenberghe <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>      > On 13 Jan 2016, at 13:42, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      >
>      > I mentioned bytecode because I dont want the user to see at some
>     point bytecode and say "What the hell is that" I want the reader to
>     feel confident that at least understands the basic in Pharo. Also
>     very brief explanations about bytecode I have seen in similar python
>     tutorials. Obviously I dont want to go any deeper than that because
>     the user wont have to worry about the technical details on a daily
>     basis anyway.
>      >
>      > I agree that I could add a bit more on the VM description similar
>     to what you posted. I am curious though, wont even the interpreter
>     generate machine code in order to execute the code  or does it use
>     existing machine code inside the VM binary ?
>
>     No, a classic interpreter does not 'generate' machine code, it is
>     just a program that reads and executes bytes codes in a loop, the
>     interpreter 'is' machine code.
>
>     No offence, but you see why I think it is important to not try to
>     use or explain too much complex concepts in the 1st chapter.
>
>     Learning to program is hard. It should first be done abstractly.
>     Think about Scratch. The whole idea of Smalltalk is to create a
>     world of interacting objects. (Even byte code is not a necessary
>     concept at all, for example, in Pharo, you can compile (translate)
>     to AST and execute that, I believe. There are Smalltalk
>     implementations that compile directly to C or JavaScript). Hell,
>     even 'compile' is not necessary, just 'accept'. See ?
>
>      > On Wed, Jan 13, 2016 at 2:25 PM Sven Van Caekenberghe
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > Sounds about right.
>      >
>      > Now, I would swap 1 and 4, as the image is the most important
>     abstraction.
>      >
>      > There is also a bit too much emphasis on (byte|source)code. This
>     is already pretty technical (it assume you know what compilation is
>     and so on). But I understand it must be explained here, and you did
>     it well.
>      >
>      > However, I would start by saying that the image is a snapshot of
>     the object world in memory that is effectively a live Pharo system.
>     It contains everything that is available and that exists in Pharo.
>     This includes any objects that you created yourself, windows,
>     browsers, open debuggers, executing processes, all meta objects as
>     well as all representations of code.
>      >
>      > <sidenote>
>      > The fact that there is a sources and changes file is an
>     implementation artefact, not something fundamental. There are ideas
>     to change this in the future (but you do not have to mention that).
>      > </sidenote>
>      >
>      > Also, the VM not only executes code, it maintains the object
>     world, which includes the ability to load and save it from and to an
>     image. It creates a portable (cross platform) abstraction that
>     isolates the image from the particular details of the underlying
>     hardware and OS. In that role it implements the interface with the
>     outside world. I would mention that second part before mentioning
>     the code execution.
>      >
>      > The sentence "The purpose of the VM is to take Pharo bytcode that
>     is generated each time user accepts a piece of code and convert it
>     to machine code in order to be executed." is not 100% correct. It is
>     possible to execute the byte code without converting it. This is
>     called interpretation. JIT is a faster technique that includes
>     converting (some often used) byte code to machine code and caching that.
>      >
>      > I hope this helps (it is hard to write a 'definitive explanation'
>     as there are some many aspects to this and it depends on the
>     context/audience).
>      >
>      > > On 13 Jan 2016, at 12:58, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > >
>      > > So I am correct that the image does not store the source code,
>     and that the source code is stored in sources and changes. The only
>     diffirence is that the objects have a source variable that points to
>     the right place for finding the source code.
>      > >
>      > > This is the final text if you find anything incorrect please
>     correct me
>      > >
>      > > ---------------
>      > >
>      > > 1. The virtual machine (VM) is the only component that is
>     different for each operating system. The purpose of the VM is to
>     take Pharo bytcode that is generated each time user accepts a piece
>     of code and convert it to machine code in order to be executed.
>     Pharo 4 comes with the Cog VM a very fast JIT VM. The VM executable
>     is named:
>      > >
>      > > • Pharo.exe for Windows; • pharo for Linux ; and
>      > >
>      > > • Pharo for OSX (inside a package also named Pharo.app).
>      > > The other components below are portable across operating
>     systems, and
>      > >
>      > > can be copied and run on any appropriate virtual machine.
>      > >
>      > > 2. The sources file contains source code for parts of Pharo
>     that don’t change frequently. Sources file is important because the
>     image file format stores only the bytecode of live objects and not
>     their source code. Typically a new sources file is generated once
>     per major release of Pharo. For Pharo 4.0, this file is named
>     PharoV40.sources.
>      > >
>      > > 3. The changes file logs of all source code modifications since
>     the .sources file was generated. This facilitates a per method
>     history for diffs or re- verting.That means that even if you dont
>     manage to save the image file on a crash or you just forgot you can
>     recover your changes from this file. Each release provides a near
>     empty file named for the release, for example Pharo4.0.changes.
>      > >
>      > > 4. The image file provides a frozen in time snapshot of a
>     running Pharo system. This is the file where the Pharo bytecode is
>     stored and as such its a cross platform format. This is the heart of
>     Pharo, containing the live state of all objects in the system
>     (including classes and methods, since they are objects too). The
>     file is named for the release (like Pharo4.0.image).
>      > >
>      > > The .image and .changes files provided by a Pharo release are
>     the starting point for a live environment that you adapt to your
>     needs. Essentially the image file containes the compiler of the
>     language (not the VM) , the language parser, the IDE tools, many
>     libraries and acts a bit like a virtual Operation System that runs
>     on top of a Virtual Machine (VM), similarly to ISO files.
>      > >
>      > > As you work in Pharo, these files are modified, so you need to
>     make sure that they are writable. The .image and .changes files are
>     intimately linked and should always be kept together, with matching
>     base filenames. Never edit them directly with a text editor, as
>     .images holds your live object runtime memory, which indexes into
>     the .changes files for the source. It is a good idea to keep a
>     backup copy of the downloaded .image and .changes files so you can
>     always start from a fresh image and reload your code. However the
>     most efficient way for backing up code is to use a version control
>     system that will provide an easier and powerful way to back up and
>     track your changes.
>      > >
>      > > The four main component files above can be placed in the same
>     directory, although it’s also possible to put the Virtual Machine
>     and sources file in a separate directory where everyone has
>     read-only access to them.
>      > >
>      > > If more than one image file is present in the same directory
>     pharo will prompt you to choose an image file you want to load.
>      > >
>      > > Do whatever works best for your style of working and your
>     operating system.
>      > >
>      > >
>      > >
>      > >
>      > >
>      > > On Wed, Jan 13, 2016 at 12:13 PM Sven Van Caekenberghe
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > >
>      > > > On 13 Jan 2016, at 10:57, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > > >
>      > > > I was adding a short description to the UPBE about sources
>     file , I always thought that the sources file is the file that
>     contains the source code of the image because the image file itself
>     stores only the bytecode.
>      > > >
>      > > > However its just came to my attention that the sources file
>     does not contain code that is recently installed in the image.
>      > > >
>      > > > So how exactly the sources file works and what it is ?
>      > >
>      > > The main perspective is from the object point of view: methods
>     are just objects like everything else. In order to be executable
>     they know their byte codes (which might be JIT compiled on
>     execution, but that is an implementation detail) and they know their
>     source code.
>      > >
>      > > Today we would probably just store the source code strings in
>     the image (maybe compressed) as memory is pretty cheap. But way back
>     when Smalltalk started, that was not the case. So they decided to
>     map the source code out to files.
>      > >
>      > > So method source code is a magic string (RemoteString) that
>     points to some position in a file. There are 2 files in use: the
>     sources file and the changes file.
>      > >
>      > > The sources file is a kind of snapshot of the source code of
>     all methods at the point of release of a major new version. That is
>     why there is a Vxy in their name. The source file never changes once
>     created or renewed (a process called generating the sources, see
>     PharoSourcesCondenser).
>      > >
>      > > While developing and creating new versions of methods, the new
>     source code is appended to another file called the changes file,
>     much like a transaction log. This is also a safety mechanism to
>     recover 'lost' changes.
>      > >
>      > > The changes file can contain multiple versions of a method.
>     This can be reduced in size using a process called condensing the
>     changes, see PharoChangesCondenser.
>      > >
>      > > On a new release, the changes file will be (almost) empty.
>      > >
>      > > HTH,
>      > >
>      > > Sven
>      > >
>      > >
>      > >
>      >
>      >
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Understanding the role of the sources file

kilon.alios
"I am getting off-topic, but since I have already written what follows, and I think it is informative, I am leaving it there :-)"

Thats ok my question has been answered by Sven's first reply anyway, some offtopic wont hurt

"Python is a rather large language, with a large number of quirks that need to be preserved to provide full compatibility. It is also a language with advanced introspection, and very dynamic semantics, even more than Smalltalk in some aspects. And its main implementation (CPython) has very highly optimised data structures and memory management.

That makes it really difficult to implement correctly and and to improve performance without slowing down other parts. Before PyPy, Google funded the Unladen Swallow project, that tried to produce a Python JIT using LLVM, this project was abandoned. Even PyPy needs to make frequent and expensive guard tests to ensure that invariants required for JIT are preserved. It does bring a significant speed boost (2x-10x), but at the expense of slow startup and increased memory footprint (~2x).

However, it seems clear that the long term future of Python lies in PyPy. In particular, they have high hopes to solve the lack of multicore support by implementing software transactional memory in the medium term, until hardware transactional memory is advanced enough to be usable there."

Well I have watched a couple of talks of Guido (creator of cpython for those who dont know) on the subject, and I think his overall argument "why should I give a damn ?"

His line of argument is that Python offers such a great support for wrapping C libraries that makes a JIT complier not that important. Python initial missions was to be a scripting language for C/C++ , the fact it grew to an entity by itself is the side effect of its popularity. However Guido even lately stresses the idea what Python should remain one of the best if not the best way to wrap C code. So if you start using and relying on C code so much a JIT makes less and less sense. For example even cpython itself over 50% of its code base is coded in C. Cpython developers are essentially C developers.

Pharo is not on the same boat, Pharo is a self hosting enviroment, that means that a great deal of code is written in pharo , some of it Slang that is basically smalltalk that compiles to C and some of the really performance dependent parts in pure C. Also Pharo libraries are not C libraries wrapped for Pharo, they are just Pharo libraries. So it is a very diffirent mentality .

So the reason why CPython has not a JIT VM is that it does really need it, python coders already rely on performance orientated python libraries( see numpy) that are written in C to do the heavy lifiting, I dont think a JIT VM would make much of a difference anyway.

On the other hand you rarely see someone here asking for the Pharo FFI, or how to make a VM plugin , in 99% of the cases people work on pure pharo code. 

On Wed, Jan 13, 2016 at 8:04 PM David Allouche <[hidden email]> wrote:
On 13 Jan 2016, at 18:14, Dimitris Chloupis <[hidden email]> wrote:

Why the Cog VM is mentioned in my documentation is no accident. There is another VM that is called Stackless VM or old VM that was the VM before Cog that had no JIT and its a VM we still use for platforms that Cog VM does not currently run like Raspberry Pi, Android and I think iOS too.


"But nowadays, JIT compiling byte-code to machine language are the norm rather than the exception, so I do not think it is worth mentioning."

Definitely not the norm Python and Ruby VMs are not JIT, generally speaking JIT compilers are rare for Dynamic languages like Pharo. Python has PyPy which is a JIT VM but is nowhere as popular as Cpython.

You are right. The JIT is worth mentioning there.


CPython people try to avoid JIT because it makes VM architecture much more complex. At least thats their argument.

I am getting off-topic, but since I have already written what follows, and I think it is informative, I am leaving it there :-)

Python is a rather large language, with a large number of quirks that need to be preserved to provide full compatibility. It is also a language with advanced introspection, and very dynamic semantics, even more than Smalltalk in some aspects. And its main implementation (CPython) has very highly optimised data structures and memory management.

That makes it really difficult to implement correctly and and to improve performance without slowing down other parts. Before PyPy, Google funded the Unladen Swallow project, that tried to produce a Python JIT using LLVM, this project was abandoned. Even PyPy needs to make frequent and expensive guard tests to ensure that invariants required for JIT are preserved. It does bring a significant speed boost (2x-10x), but at the expense of slow startup and increased memory footprint (~2x).

However, it seems clear that the long term future of Python lies in PyPy. In particular, they have high hopes to solve the lack of multicore support by implementing software transactional memory in the medium term, until hardware transactional memory is advanced enough to be usable there.

Another big advantage of Pharo is that it has luxury to come with a JIT included and well tested.

I guess the simplicity of the language makes it relatively easy to produce a JIT. This is indeed a very good thing.


The only popular JITs out there are Java's and .NET, dont know if Javascript V8 use JIT as well, but thats pretty much it. But bytecode is very popular indeed.

V8 does use JIT. I believe it was the first widely deployed language implementation to use a tracing interpreter to JIT a language with latent typing. I believe all the other major Javascript implementations have followed suite.

This is also the approach used by PyPy. Java and .Net have static typing, so they do not require a tracing interpreter to compile to machine code.

On Wed, Jan 13, 2016 at 6:43 PM David Allouche <[hidden email]> wrote:
On 13 Jan 2016, at 16:27, Dimitris Chloupis <[hidden email]> wrote:

"The virtual machine (VM) provides the environment where the Pharo system lives. It is different for each operating system and hardware architecture, and runs as a machine language executable in the operating system. It implements the details of managing memory, executing Pharo byte-code, and communicating with the world outside of the Pharo system: files, other operating system process, and the network."

No the environment is the image, the VM is basically what its names says, a machine emulated by software. The vast majority of tools, even the language itself reside on the image. VM is there in order for the code to be able to execute and to interface with the underlying Operating System. You could completely modify the VM , for example move it to the JVM and still the pharo enviroment would be intact.

That is indeed the idea I tried to convey by "where the Pharo system lives", but I see how that can be misunderstood. However, I try to avoid using "big words" like "abstraction".

How about this?

"The virtual machine (VM) provides the portable environment to execute Pharo images. Its implementation needs to be different for each operating system and hardware architecture, as it runs as a machine language executable in the operating system. It implements the details of managing memory, executing Pharo byte-code, and communicating with the world outside of the Pharo system: files, other operating system process, and the network. 


"Hi Dimitris,
your formulation "...Pharo bytcode...and convert it to machine code..."
is insofar irritating to me as "convert it to machine code" would
suggest to me that a compiler is at work here. Davids "executing Pharo
byte-code" seems more understandable to me here."

Thats correct its a compiler, a byte compiler, it compiles bytecode to machine code and it does it while the code executes, this is why its called JIT , which has the meaning of Just In Time compilation, meaning that machine code is compiled just before the code is executed so several optimizations can be applied that would not be known before the execution of the code. Similar to JAVA's JIT compiler.

Note here that a compiler is not just something that produces machine code, a compiler for example can take one language and compile it to another language.

That's technically true. But most readers will probably unconsciously assume that a compiler is something that produces machine language. The document should be careful to avoid misunderstandings caused by such common assumptions.

As for the JIT, that is totally an implementation detail, and I believe it is only worth mentioning if you want to prevent the reader from assuming that Pharo is slow because its VM executes byte-code. But nowadays, JIT compiling byte-code to machine language are the norm rather than the exception, so I do not think it is worth mentioning.



On Wed, Jan 13, 2016 at 4:58 PM Werner Kassens <[hidden email]> wrote:
Hi Dimitris,
your formulation "...Pharo bytcode...and convert it to machine code..."
is insofar irritating to me as "convert it to machine code" would
suggest to me that a compiler is at work here. Davids "executing Pharo
byte-code" seems more understandable to me here.
werner

On 01/13/2016 02:22 PM, Dimitris Chloupis wrote:
> I assume you have never read a an introduction to C++ then :D
>
> here is the final addition for the vm
>
> (Vm) is the only component that is different for each operating system.
> The main purpose of the VM is to take Pharo bytcode that is generated
> each time user accepts a piece of code and convert it to machine code in
> order to be executed, but also to generally handle low level
> functionality like interpreting code, handling OS events (mouse and
> keyboard), calling C libraries etc. Pharo 4 comes with the Cog VM a very
> fast JIT VM.
>
> I think its clear, precise and does not leave much room for confusion.
> Personally I think its very important for the absolute begineer to have
> strong foundations of understanding the fundamental of Pharo and not for
> things to appear magical and "dont touch this".
>
> On Wed, Jan 13, 2016 at 2:54 PM Sven Van Caekenberghe <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>      > On 13 Jan 2016, at 13:42, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      >
>      > I mentioned bytecode because I dont want the user to see at some
>     point bytecode and say "What the hell is that" I want the reader to
>     feel confident that at least understands the basic in Pharo. Also
>     very brief explanations about bytecode I have seen in similar python
>     tutorials. Obviously I dont want to go any deeper than that because
>     the user wont have to worry about the technical details on a daily
>     basis anyway.
>      >
>      > I agree that I could add a bit more on the VM description similar
>     to what you posted. I am curious though, wont even the interpreter
>     generate machine code in order to execute the code  or does it use
>     existing machine code inside the VM binary ?
>
>     No, a classic interpreter does not 'generate' machine code, it is
>     just a program that reads and executes bytes codes in a loop, the
>     interpreter 'is' machine code.
>
>     No offence, but you see why I think it is important to not try to
>     use or explain too much complex concepts in the 1st chapter.
>
>     Learning to program is hard. It should first be done abstractly.
>     Think about Scratch. The whole idea of Smalltalk is to create a
>     world of interacting objects. (Even byte code is not a necessary
>     concept at all, for example, in Pharo, you can compile (translate)
>     to AST and execute that, I believe. There are Smalltalk
>     implementations that compile directly to C or JavaScript). Hell,
>     even 'compile' is not necessary, just 'accept'. See ?
>
>      > On Wed, Jan 13, 2016 at 2:25 PM Sven Van Caekenberghe
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > Sounds about right.
>      >
>      > Now, I would swap 1 and 4, as the image is the most important
>     abstraction.
>      >
>      > There is also a bit too much emphasis on (byte|source)code. This
>     is already pretty technical (it assume you know what compilation is
>     and so on). But I understand it must be explained here, and you did
>     it well.
>      >
>      > However, I would start by saying that the image is a snapshot of
>     the object world in memory that is effectively a live Pharo system.
>     It contains everything that is available and that exists in Pharo.
>     This includes any objects that you created yourself, windows,
>     browsers, open debuggers, executing processes, all meta objects as
>     well as all representations of code.
>      >
>      > <sidenote>
>      > The fact that there is a sources and changes file is an
>     implementation artefact, not something fundamental. There are ideas
>     to change this in the future (but you do not have to mention that).
>      > </sidenote>
>      >
>      > Also, the VM not only executes code, it maintains the object
>     world, which includes the ability to load and save it from and to an
>     image. It creates a portable (cross platform) abstraction that
>     isolates the image from the particular details of the underlying
>     hardware and OS. In that role it implements the interface with the
>     outside world. I would mention that second part before mentioning
>     the code execution.
>      >
>      > The sentence "The purpose of the VM is to take Pharo bytcode that
>     is generated each time user accepts a piece of code and convert it
>     to machine code in order to be executed." is not 100% correct. It is
>     possible to execute the byte code without converting it. This is
>     called interpretation. JIT is a faster technique that includes
>     converting (some often used) byte code to machine code and caching that.
>      >
>      > I hope this helps (it is hard to write a 'definitive explanation'
>     as there are some many aspects to this and it depends on the
>     context/audience).
>      >
>      > > On 13 Jan 2016, at 12:58, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > >
>      > > So I am correct that the image does not store the source code,
>     and that the source code is stored in sources and changes. The only
>     diffirence is that the objects have a source variable that points to
>     the right place for finding the source code.
>      > >
>      > > This is the final text if you find anything incorrect please
>     correct me
>      > >
>      > > ---------------
>      > >
>      > > 1. The virtual machine (VM) is the only component that is
>     different for each operating system. The purpose of the VM is to
>     take Pharo bytcode that is generated each time user accepts a piece
>     of code and convert it to machine code in order to be executed.
>     Pharo 4 comes with the Cog VM a very fast JIT VM. The VM executable
>     is named:
>      > >
>      > > • Pharo.exe for Windows; • pharo for Linux ; and
>      > >
>      > > • Pharo for OSX (inside a package also named Pharo.app).
>      > > The other components below are portable across operating
>     systems, and
>      > >
>      > > can be copied and run on any appropriate virtual machine.
>      > >
>      > > 2. The sources file contains source code for parts of Pharo
>     that don’t change frequently. Sources file is important because the
>     image file format stores only the bytecode of live objects and not
>     their source code. Typically a new sources file is generated once
>     per major release of Pharo. For Pharo 4.0, this file is named
>     PharoV40.sources.
>      > >
>      > > 3. The changes file logs of all source code modifications since
>     the .sources file was generated. This facilitates a per method
>     history for diffs or re- verting.That means that even if you dont
>     manage to save the image file on a crash or you just forgot you can
>     recover your changes from this file. Each release provides a near
>     empty file named for the release, for example Pharo4.0.changes.
>      > >
>      > > 4. The image file provides a frozen in time snapshot of a
>     running Pharo system. This is the file where the Pharo bytecode is
>     stored and as such its a cross platform format. This is the heart of
>     Pharo, containing the live state of all objects in the system
>     (including classes and methods, since they are objects too). The
>     file is named for the release (like Pharo4.0.image).
>      > >
>      > > The .image and .changes files provided by a Pharo release are
>     the starting point for a live environment that you adapt to your
>     needs. Essentially the image file containes the compiler of the
>     language (not the VM) , the language parser, the IDE tools, many
>     libraries and acts a bit like a virtual Operation System that runs
>     on top of a Virtual Machine (VM), similarly to ISO files.
>      > >
>      > > As you work in Pharo, these files are modified, so you need to
>     make sure that they are writable. The .image and .changes files are
>     intimately linked and should always be kept together, with matching
>     base filenames. Never edit them directly with a text editor, as
>     .images holds your live object runtime memory, which indexes into
>     the .changes files for the source. It is a good idea to keep a
>     backup copy of the downloaded .image and .changes files so you can
>     always start from a fresh image and reload your code. However the
>     most efficient way for backing up code is to use a version control
>     system that will provide an easier and powerful way to back up and
>     track your changes.
>      > >
>      > > The four main component files above can be placed in the same
>     directory, although it’s also possible to put the Virtual Machine
>     and sources file in a separate directory where everyone has
>     read-only access to them.
>      > >
>      > > If more than one image file is present in the same directory
>     pharo will prompt you to choose an image file you want to load.
>      > >
>      > > Do whatever works best for your style of working and your
>     operating system.
>      > >
>      > >
>      > >
>      > >
>      > >
>      > > On Wed, Jan 13, 2016 at 12:13 PM Sven Van Caekenberghe
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > >
>      > > > On 13 Jan 2016, at 10:57, Dimitris Chloupis
>     <[hidden email] <mailto:[hidden email]>> wrote:
>      > > >
>      > > > I was adding a short description to the UPBE about sources
>     file , I always thought that the sources file is the file that
>     contains the source code of the image because the image file itself
>     stores only the bytecode.
>      > > >
>      > > > However its just came to my attention that the sources file
>     does not contain code that is recently installed in the image.
>      > > >
>      > > > So how exactly the sources file works and what it is ?
>      > >
>      > > The main perspective is from the object point of view: methods
>     are just objects like everything else. In order to be executable
>     they know their byte codes (which might be JIT compiled on
>     execution, but that is an implementation detail) and they know their
>     source code.
>      > >
>      > > Today we would probably just store the source code strings in
>     the image (maybe compressed) as memory is pretty cheap. But way back
>     when Smalltalk started, that was not the case. So they decided to
>     map the source code out to files.
>      > >
>      > > So method source code is a magic string (RemoteString) that
>     points to some position in a file. There are 2 files in use: the
>     sources file and the changes file.
>      > >
>      > > The sources file is a kind of snapshot of the source code of
>     all methods at the point of release of a major new version. That is
>     why there is a Vxy in their name. The source file never changes once
>     created or renewed (a process called generating the sources, see
>     PharoSourcesCondenser).
>      > >
>      > > While developing and creating new versions of methods, the new
>     source code is appended to another file called the changes file,
>     much like a transaction log. This is also a safety mechanism to
>     recover 'lost' changes.
>      > >
>      > > The changes file can contain multiple versions of a method.
>     This can be reduced in size using a process called condensing the
>     changes, see PharoChangesCondenser.
>      > >
>      > > On a new release, the changes file will be (almost) empty.
>      > >
>      > > HTH,
>      > >
>      > > Sven
>      > >
>      > >
>      > >
>      >
>      >
>
>

12