While it is bad form to move a private discussion to (or back to) a public forum, some of these links might be interesting to people here and I have been unable to send emails to Tobias after my initial reply. An attempt on Wednesday and on Friday made mcrelay.correio.biz complain that mx00.emig.gmx.net[ refused to talk to it and an attempt from my old 1991 email account on Monday complained about the email address though it was ok as far as I can tell. Tobias wrote: > Jecel wrote: > > [new direction: emulate bytecodes and RISC-V] > > That'a an interesting take. > > I can only watch from afar, but its all interesting. (for example that guy > who does RISC-V cpu in TTL chips: https://www.youtube.com/channel/UCBcljXmuXPok9kT_VGA3adg ) It is an interesting project. I was annoyed by his claim to have the first homebrew TTL 32 bit processor since in the late 1990s a group of students at the MIT processor design course implemented the Beta processor in TTLs instead of using FPGAs like all other groups (before or since). Sadly, all information about this has been eliminated from the web and can't even be found in archive.org. I tried to get the local universities to teach RISC-V to their students instead of their own educational RISC processors but they are too emotionally attached to their designs. > Sounds reasonable. Let's have them know dynamic languages are also still there ;) > (I mean, you're very familiar with both Smalltalk and Self...) Mario Wolczko has been involved in Java since the late 1990s but was part of the Self group before that and had created the Mushroom Smalltalk computer before that. http://www.wolczko.com/ Boris Shingarov is currently involved with Java but has given a lot of talk about Smalltalk VMs and was involved in Squeak back in the OS/2 days. http://shingarov.com/ With me, that was 3 out of 6 people at the meeting representing the Smalltalk viewpoint. We shall see if that will have any practical effect. > The TLB is somewhat maintained by the CPU to manage the translation of virtual addresses to physical ones. > > I can imagine something similar, like a branch, that upon return, updates a filed > in a PIC buffer, such that the next time the branch is only taken if a register (eg, class of the object) > is different or so. Ok, Mario actually mentioned that with today's advanced branch prediction hardware we might want to re-evaluate PICs. In this case you wouldn't be using the TLBs but the BTB (Branch Target Buffer) hardware. https://www.slideshare.net/lerruby/like-2014214 Mario might have actually been thinking about Urs Hölzle's ECOOP 95 paper, which was a slightly different subject. http://hoelzle.org/publications/ecoop95-dispatch.pdf They were looking at the different kinds of software implementation of method dispatch (not only PICs) and the effects of processors executing more and more instructions per clock cycle. That might make a scheme that is bad for a simple RISC (due to many tests, for example) actually work well on an advanced out-of-order processor (due to the test being "free" since they execute in parallel with the main code). They didn't look at branch prediction hardware, but it certainly would have a huge impact. Several of the later papers focused on branch prediction: http://hoelzle.org/publications.html > > For SiliconSqueak I actually had two different PIC instructions. They > > modified how the instruction cache works. Normally the instruction cache > > is accessed by hashing the 32 bit value of the PC except for the lowest > > bits which select a byte in the cache line, but after a PIC instruction > > the hash used a 64 bit value that combined the PC (all bits) and the > > pointer to the receiver's class. The resulting cache line was fetched > > and instructions executed in sequence even though the PC didn't change. > > Any branch or call instruction would restart normal execution at the new > > PC. > > Sounds neat! > > > So a PIC entry takes up exactly one cache line. A PIC can have as many > > entries as needed and the instruction takes the same time to execute no > > matter how many entries there are (not taking into account cache > > misses). > > Wow thats incredible. > > > The second PIC instruction works exactly like the first but it supplies > > a different value to be used in place of the current PC. That allows > > different call sites to share PIC entries if needed, though that might > > be more complicated than it is worth. > > Maybe. What I like about PICs per send site is that you can essentially use them > as data source for dynamic feedback (what "types" where actually seen at this send site?) > and one probably would need some instructions to fetch those infos from the PIC. One of the papers in that list is the 1997 techical report "The Space Overhead of Customization". One of the reasons that Java won over Self was that its simple interpreter ran on 8MB machines that most of Sun's customers had while Self needed 24MB workstations which were rare (but would be very common just two years later). Part of that was due to compiling a new version of native code for every different type of receiver even if the different versions didn't really help. My idea of allowing PICs to be optionally shared was that this would allow customization to be limited in certain cases to save memory. It would cause a loss of information about types seen at a call site, but that doesn't always have a great impact on performance. -- Jecel |
Just had to interject... are you sure about the year and RAM sizes below? If correct, was this a lowest common denominator RAM size or perhaps a report from several years earlier? I remember in 1997 (might have been early '98) buying a Sun Ultrasparc 2 with 256MB RAM for work (for a workstation, not our servers which had far more) and having 192MB RAM in one of my machines at home. Seems strange that 'most' of Sun's customers would be at 8 meg in that timeframe as 16-32 was fairly common even on PCs on the low end of what I was working with. On Tue, Jul 24, 2018, 7:18 PM Jecel Assumpcao Jr. <[hidden email]> wrote:
|
On 25 July 2018 at 08:00, Phil B <[hidden email]> wrote: > > On Tue, Jul 24, 2018, 7:18 PM Jecel Assumpcao Jr. <[hidden email]> wrote: >> >> One of the papers in that list is the 1997 techical report "The Space >> Overhead of Customization". One of the reasons that Java won over Self >> was that its simple interpreter ran on 8MB machines that most of Sun's >> customers had while Self needed 24MB workstations which were rare (but >> would be very common just two years later). Part of that was due to >> compiling a new version of native code for every different type of >> receiver even if the different versions didn't really help. >> > > Just had to interject... are you sure about the year and RAM sizes below? If correct, was this a lowest common denominator RAM size or perhaps a report from several years earlier? > > I remember in 1997 (might have been early '98) buying a Sun Ultrasparc 2 with 256MB RAM for work (for a workstation, not our servers which had far more) and having 192MB RAM in one of my machines at home. Seems strange that 'most' of Sun's customers would be at 8 meg in that timeframe as 16-32 was fairly common even on PCs on the low end of what I was working with. I got curious... 1993 = 66-MHz machine 16MB RAM 1997 = 233-MHz machine 64MB RAM 1999 = 500-MHz machine 128MB RAM https://www.pcmag.com/article2/0,2817,2390914,00.asp cheers -ben |
In reply to this post by Jecel Assumpcao Jr
Hi, Overhead of Customization". One of the reasons that Java won over Self was that its simple interpreter ran on 8MB machines that most of Sun's customers had while Self needed 24MB workstations which were rare (but would be very common just two years later). Part of that was due to compiling a new version of native code for every different type of receiver even if the different versions didn't really help. Note that Javascript people had the same issue and they built many work-around, such as fine-tuning transitions between hidden classes/map or sharing the code between hidden classes/map. In addition, the Self VM featured a non optimizing and an optimizing JIT but no interpreter. The interpreter is the bit allowing to save memory there (code rarely used can be kept in the form of bytecode, which is usually ~5x more compact than n code, and unused compiled native code can be GC'ed). So there's a lot to argue there. I don't think Java won because of technical issues but political ones, the Self VM could definitely have been patched to fix this issue. My idea of allowing PICs to be optionally shared was that this would allow customization to be limited in certain cases to save memory. It would cause a loss of information about types seen at a call site, but that doesn't always have a great impact on performance. We do that in the Cog for openPIC (Polymorphism of 6+ cases). In any case the optimizing JIT rarely optimizes more than 2 cases (though in some VM such as JSCore it can optimize up to 8 cases), so sharing openPICs makes sense. For closedPIC, i.e. the jump tables, I don't think they represent a lot of memory and indeed the type information is relevant, so, not sure about sharing those. On Wed, Jul 25, 2018 at 1:18 AM, Jecel Assumpcao Jr. <[hidden email]> wrote:
|
In reply to this post by Jecel Assumpcao Jr
Hi Jecel > On 25.07.2018, at 01:18, Jecel Assumpcao Jr. <[hidden email]> wrote: > > > While it is bad form to move a private discussion to (or back to) a > public forum, some of these links might be interesting to people here > and I have been unable to send emails to Tobias after my initial reply. > An attempt on Wednesday and on Friday made mcrelay.correio.biz complain > that mx00.emig.gmx.net[ refused to talk to it and an attempt from my old > 1991 email account on Monday complained about the email address though > it was ok as far as I can tell. bummer. Sorry for my ISP… Then let's continue here. > > Tobias wrote: >> Jecel wrote: >>> [new direction: emulate bytecodes and RISC-V] >> >> That'a an interesting take. >> >> I can only watch from afar, but its all interesting. (for example that guy >> who does RISC-V cpu in TTL chips: https://www.youtube.com/channel/UCBcljXmuXPok9kT_VGA3adg ) > > It is an interesting project. I was annoyed by his claim to have the > first homebrew TTL 32 bit processor since in the late 1990s a group of > students at the MIT processor design course implemented the Beta > processor in TTLs instead of using FPGAs like all other groups (before > or since). Sadly, all information about this has been eliminated from > the web and can't even be found in archive.org. Yea but as far as I can see that person is fine with being corrected, so maybe someone should tell him? :) > > I tried to get the local universities to teach RISC-V to their students > instead of their own educational RISC processors but they are too > emotionally attached to their designs. > >> Sounds reasonable. Let's have them know dynamic languages are also still there ;) >> (I mean, you're very familiar with both Smalltalk and Self...) > > Mario Wolczko has been involved in Java since the late 1990s but was > part of the Self group before that and had created the Mushroom > Smalltalk computer before that. > > http://www.wolczko.com/ right > > Boris Shingarov is currently involved with Java but has given a lot of > talk about Smalltalk VMs and was involved in Squeak back in the OS/2 > days. > > http://shingarov.com/ > > With me, that was 3 out of 6 people at the meeting representing the > Smalltalk viewpoint. We shall see if that will have any practical > effect. that sounds great input to that project. > >> The TLB is somewhat maintained by the CPU to manage the translation of virtual addresses to physical ones. >> >> I can imagine something similar, like a branch, that upon return, updates a filed >> in a PIC buffer, such that the next time the branch is only taken if a register (eg, class of the object) >> is different or so. > > Ok, Mario actually mentioned that with today's advanced branch > prediction hardware we might want to re-evaluate PICs. In this case you > wouldn't be using the TLBs but the BTB (Branch Target Buffer) hardware. > > https://www.slideshare.net/lerruby/like-2014214 > > Mario might have actually been thinking about Urs Hölzle's ECOOP 95 > paper, which was a slightly different subject. > > http://hoelzle.org/publications/ecoop95-dispatch.pdf > > They were looking at the different kinds of software implementation of > method dispatch (not only PICs) and the effects of processors executing > more and more instructions per clock cycle. That might make a scheme > that is bad for a simple RISC (due to many tests, for example) actually > work well on an advanced out-of-order processor (due to the test being > "free" since they execute in parallel with the main code). They didn't > look at branch prediction hardware, but it certainly would have a huge > impact. Several of the later papers focused on branch prediction: > > http://hoelzle.org/publications.html Today I learned about BTB… > >>> For SiliconSqueak I actually had two different PIC instructions. They >>> modified how the instruction cache works. Normally the instruction cache >>> is accessed by hashing the 32 bit value of the PC except for the lowest >>> bits which select a byte in the cache line, but after a PIC instruction >>> the hash used a 64 bit value that combined the PC (all bits) and the >>> pointer to the receiver's class. The resulting cache line was fetched >>> and instructions executed in sequence even though the PC didn't change. >>> Any branch or call instruction would restart normal execution at the new >>> PC. >> >> Sounds neat! >> >>> So a PIC entry takes up exactly one cache line. A PIC can have as many >>> entries as needed and the instruction takes the same time to execute no >>> matter how many entries there are (not taking into account cache >>> misses). >> >> Wow thats incredible. >> >>> The second PIC instruction works exactly like the first but it supplies >>> a different value to be used in place of the current PC. That allows >>> different call sites to share PIC entries if needed, though that might >>> be more complicated than it is worth. >> >> Maybe. What I like about PICs per send site is that you can essentially use them >> as data source for dynamic feedback (what "types" where actually seen at this send site?) >> and one probably would need some instructions to fetch those infos from the PIC. > > One of the papers in that list is the 1997 techical report "The Space > Overhead of Customization". One of the reasons that Java won over Self > was that its simple interpreter ran on 8MB machines that most of Sun's > customers had while Self needed 24MB workstations which were rare (but > would be very common just two years later). Part of that was due to > compiling a new version of native code for every different type of > receiver even if the different versions didn't really help. > > My idea of allowing PICs to be optionally shared was that this would > allow customization to be limited in certain cases to save memory. It > would cause a loss of information about types seen at a call site, but > that doesn't always have a great impact on performance. That makes a lot of sense. Maybe there's a way to have both variants… Best regards -Tobias |
In reply to this post by Ben Coman
Ben Coman wrote on Wed, 25 Jul 2018 12:36:18 +0800 > On 25 July 2018 at 08:00, Phil B wrote: > > On Tue, Jul 24, 2018, 7:18 PM Jecel Assumpcao Jr. wrote: > >> > >> One of the papers in that list is the 1997 techical report "The Space > >> Overhead of Customization". One of the reasons that Java won over Self > >> was that its simple interpreter ran on 8MB machines that most of Sun's > >> customers had while Self needed 24MB workstations which were rare (but > >> would be very common just two years later). Part of that was due to > >> compiling a new version of native code for every different type of > >> receiver even if the different versions didn't really help. > >> > > > > Just had to interject... are you sure about the year and RAM sizes below? > > If correct, was this a lowest common denominator RAM size or perhaps a > > report from several years earlier? Self 3/4 was indeed from 1993 to 1994 while they paper that did that space analysis was from 1997. The decision to drop Self and TCL in favor of Java was from late 1994 and cited the installed base at that time, not the machines that were being sold then. > > I remember in 1997 (might have been early '98) buying a Sun Ultrasparc 2 > > with 256MB RAM for work (for a workstation, not our servers which had far > > more) and having 192MB RAM in one of my machines at home. Seems > > strange that 'most' of Sun's customers would be at 8 meg in that timeframe > > as 16-32 was fairly common even on PCs on the low end of what I was working > > with. > > I got curious... > 1993 = 66-MHz machine 16MB RAM > 1997 = 233-MHz machine 64MB RAM > 1999 = 500-MHz machine 128MB RAM > https://www.pcmag.com/article2/0,2817,2390914,00.asp When I talk to students about researching computer history, I always caution them to take old magazines with a grain of salt. I point out that these tend to focus on the very best products and not what typical people actually bought. If they looked at car magazines, for example, they might conclude that in 1978 everone was driving around in Ferraris and BMWs. And like I said above, you have to take into account that most computers at any moment in time are older models and not what is being sold then. Though we like to make graphs that show smooth growth in computer history, that isn't always the case. For disks, for example, we got stuck with 5 and 10MB for nearly half a decade before stepping up to 20MB (and then a quick jump to 30MB with the switch from MFM to RLL). Soon after that an exponential growth started that is still going on today. In the same way, two factors slowed down memory size growth between the late 1980s and early 1990s. One was the antidumping actions by the US against Japanese companies and on PCs there was a problem that people didn't have software that could use more than 1MB even if they were buying machines with 4MB. Workstations and Macs (and Ataris and Amigas) didn't have that problem nor did PCs after Windows 3.1 become popular. -- Jecel |
In reply to this post by Clément Béra
Clément Bera wrote on Wed, 25 Jul 2018 08:12:53 +0200 > > One of the papers in that list is the 1997 techical report "The Space > > Overhead of Customization". One of the reasons that Java won over Self > > was that its simple interpreter ran on 8MB machines that most of Sun's > > customers had while Self needed 24MB workstations which were rare (but > > would be very common just two years later). Part of that was due to > > compiling a new version of native code for every different type of > > receiver even if the different versions didn't really help. > > Note that Javascript people had the same issue and they built many > work-around, such as fine-tuning transitions between hidden classes/map > or sharing the code between hidden classes/map. Since changing an object is a more lightweight operation in Javascript than in Self it makes a higher performance implementation even more complicated. But the stuff that helps Javascript can also be used to improve Self. > In addition, the Self VM featured a non optimizing and an optimizing JIT but > no interpreter. The interpreter is the bit allowing to save memory there > (code rarely used can be kept in the form of bytecode, which is usually > ~5x more compact than n code, and unused compiled native code can be > GC'ed). That is one of the thing David Ungar changed from Self 4.0 to 4.1 long after the project was officially over. That was only released in 2002 or so. I am not sure if the current Self includes an interpreter, but its bytecodes are designed to allow one. > So there's a lot to argue there. I don't think Java won because of technical > issues but political ones, the Self VM could definitely have been patched > to fix this issue. Exactly! For those who are not aware of the history, the Sun research group developed an interesting operating system called SpringOS based on Java. When they started publishing papers, Sun's customers started to get nervous because they had just suffered through the transition from SunOS (BSD) to Solaris (Unix System V) and told Sun they would switch to some competitor instead of dealing with SpringOS. So Sun cancelled the research project and made it very clear that they were "Solaris forever". This made marketing look at their language situation, so they decided to be a "Java only forever" and kill everything else. About the memory excuse, I pointed out Java only had a trivial interpreter running simple demos. Their only large application was the HotJava web browser, and David Ungar implemented one with an equivalent functionality in Self in a single afternoon (and the browser only added a few KB to the size of the image file). I claimed that by the time Java did half as much as Self it would be twice as large. I was wrong - it was many times larger. > > My idea of allowing PICs to be optionally shared was that this would > > allow customization to be limited in certain cases to save memory. It > > would cause a loss of information about types seen at a call site, but > > that doesn't always have a great impact on performance. > > We do that in the Cog for openPIC (Polymorphism of 6+ cases). In any > case the optimizing JIT rarely optimizes more than 2 cases (though in > some VM such as JSCore it can optimize up to 8 cases), so sharing > openPICs makes sense. > For closedPIC, i.e. the jump tables, I don't think they represent a lot of > memory and indeed the type information is relevant, so, not sure about > sharing those. I don't think sharing is a good idea, but I didn't want to make it impossible. Tobias Pape wrote: > That makes a lot of sense. Maybe there's a way to have both variants? Yes, I had two different instructions. But like I said above, I didn't really expect to use the second variation (shared PICs) but want to have it in case somebody needed it someday. I do know that is not how you are supposed to design a processor :-) Or even agile software, where the motto is "you are not going to need it". > > [TTL RISC-V is neat, but not first] > Yea but as far as I can see that person is fine with being corrected, so > maybe someone should tell him? :) I don't like to claim stuff without being able to show proof. And it isn't really important. But saying something was first in computer history is pretty complicated. -- Jecel |
In reply to this post by Jecel Assumpcao Jr
Jecel, Sorry I can't do inline right now (stupid Gmail Android client must have 'fixed' something)... Ok, I wasn't clear that the study had that much lag (my fault for not reading it). What threw me for a loop (and still does) was the claim that the 'average' Sun machine had that little RAM at any time in the 90s. I don't ever remember using a non-x86 Unix system in the 90s with less than 16M. And I went into 1990 on 16mb PCs (I was a day 1 adopter of Win 3.0 as it was the easiest sell to get business people off of DOS... to all human beings who have suffered as a result: sorry, but as bad as Windows is/was, DOS was worse). An aside and minor correction to your point re: PCs: *most* people didn't have software that could use more than 1m+64k (himem weird feature). But there were commonly available exceptions for example: iirc Lotus 123 v3 was still a DOS app but was able to use extended memory (subject to the constraint that only labels could go into extended memory) so you could effectively use up to 2-3 meg in a DOS-based spreadsheet. And there were CAD systems and databases that could directly use up to 4-8meg or so via DOS extenders. There were also crude task switchers, TSRs, etc that clumsily allowed using more RAM. It was pretty hellish, but possible to break the 1meg barrier in the DOS days. And there was also Windows 286/386 pre-1990. (It was bizarre, you couldn't convince most business people to look at anything that wasn't DOS or later Windows usually citing cost as a reason (really it was fear of the unknown) but they'd spend a small fortune having their config.sys/autoexec.bat optimized to find an extra 50-100k of low memory so that they could keep limping along in DOS) No doubt RAM was rediculously expensive in the 80s and 90s (I remember paying ~$1000 for the last 128meg in that 192 meg machine) but Sun machines weren't cheap either. That is why I was surprised that apparently so many people skimped on RAM in the workstation world... seems like putting a 4 cylinder engine into a high end luxury car. The low end of the PC world on the other hand has always been terrible for skimping on everything. Thanks, Phil On Wed, Jul 25, 2018, 10:27 AM Jecel Assumpcao Jr. <[hidden email]> wrote:
|
Hi Phil,
On Sun, Jul 29, 2018 at 12:38 PM, Phil B <[hidden email]> wrote:
I went to work at Rutherford Appleton Lab in 1978. In 1979 or 80 the Bubble Chamber Research Group bought a DEC 11/780, and in '80, '81 or thereabouts we added, gasp, a whole extra megabyte to the machine. We got a quote from DEC for the megabyte, which was, you guessed it, £11,780. Needless to say we ended up buying the memory from Megatech for considerably less, but it was still several thousand pounds. Gulp.
_,,,^..^,,,_ best, Eliot |
> > On Sun, Jul 29, 2018 at 12:38 PM, Phil B <[hidden email]> wrote: [snip] > And I went into 1990 on 16mb PCs (I was a day 1 adopter of Win 3.0 as it was the easiest sell to get business people off of DOS... to all human beings who have suffered as a result: sorry, but as bad as Windows is/was, DOS was worse). In the early 90s when I was at ParcPlace we had all sorts of fun with customers that screamed bloody murder about having to fit an *entire megabyte* of memory in their PCs. Something about it reducing the amount of money available for executive bonuses IIRc. To develop the Windows VW VM back then we had to have multiple autoexec.bat files so that one could test and run a (really crappy) debugger by not starting up the networking drivers. Then after it crashed, swap the autoexec.bats around, reboot, wait.... wait.... connect to the file server, move files around, edit C code, move files around, swap the autoexec.bats around, reboot, wait... wait... run the (Green Hills? Lattice?) compiler, move the autoexec.bats around, reboot, wait, run the debugger, etc etc etc. It was *awful*. At least the Macs could avoid all that stuff, even if the cost was using MacApp (or MacPal, or something like that) and having a damaged mind as a result. Oh, and when I started at PP there was no makefile for the Windows VM, so I had to hold my nose and cope with writing a makefile for the horrific 'nmake' system. Blech. I did however have my own personal Acorn Archimedes to play with though, with a colossal 4Mb of ram and a *20Mb* disk! And networking! And it was several times faster than a 386! And *not Windows*. Still got it somewhere, though not powered up since maybe 2000. These days a Raspberry Pi runs Smalltalk about 500 times as fast for less than the cost of the power lead back then. tim -- tim Rowledge; [hidden email]; http://www.rowledge.org/tim Security announcement - as of next week, passwords will be entered in Morse code. |
In reply to this post by Eliot Miranda-2
I would have been so envious... $5-10k for bragging rights alone might have been worth it back in '81! 😀 (I'm guestimating that the exchange rate with the USD was 1.5-2.0 as that's vaguely what I remember it bouncing around back then) On Sun, Jul 29, 2018, 4:42 PM Eliot Miranda <[hidden email]> wrote:
|
In reply to this post by timrowledge
I must have been lucky with my clients... never had too much trouble getting them to pony up for RAM and disk... just as long as it was running DOS/Windows... ugh. Getting them to fund innovative projects on the other hand was pulling teeth. You're describing pretty much the exact DOS hell I lived with too. (The config.sys/autoexec.bat reboot driver dance etc) Couldn't get the majority of customers to consider Unix or Mac's so Windows became the defacto solution, unfortunately. On Sun, Jul 29, 2018, 5:13 PM tim Rowledge <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |