Hello all,
As I have mentioned before, I am thinking about creating a new VM for testing some of my ideas. One thing I noticed in the blue book was the tagging they use on the first (least significant) bit. By doing this you always have to shift the number before you can work with it. My thought was moving the tag to the last bit (most significant) and making 0 mean SmallPositiveInteger and 1 for OOP's. This way, once the number is identified to be a number nothing more need be done to use it and it doesn't need to be converted to be put back. As far as negative numbers, if I use a 31 bit header for my OOPS then the first bit of the header can be a flag. If the flag is 1 it's actually a negative number, otherwise it's a normal header. To clarify what I mean, here are some examples (first half byte in binary, rest in hex) 0000-0 00 00 ff - SmallPositiveInteger(255) 0111-f ff ff ff - SmallPositiveInteger(2147483647) 1xxx-x xx xx 01 - OOP(random memory address) 1xxx-x xx xx 02 - OOP(random memory address) 1xxx-x xx xx 03 - OOP(random memory address) (at address 0xxx-x xx xx 01) 1000-0 00 00 00 - SmallNegativeInteger(-2147483647) (at address 0xxx-x xx xx 02) 1111-f ff ff ff - SmallNegativeInteger(-1) (at address 0xxx-x xx xx 03) 0xxx-x xx xx xx - OO Header for a regular object So does anyone know any problems with this approach that make it unusable/unpractical? The only things I see are: 1) This representation is tied to 2's compliment, but are there any system that don't use that? 2) This system can only deal with 2 gigs worth of address pointers (can be hard coded to high or low but not both). How does any Smalltalk get around this? 3) Negative numbers require indirect memory access. Are negative numbers used so much this would be a problem? If so, one option would be to switch from 31 bits to 30 and use 2 bits for the tag (00 = SmallPositiveInteger, 11 = SmallNegativeInteger, 01 = Float?) I think VW is using two tag bits. 4) By tagging on the high side I have no option of using a value less then 32 bits (e.g. an 8-byte char). Perhaps characters can be done as my initial negative number solution as well by having a special OO header to represent them. Since the pointer points to the top, not the bottom maybe the header could be variable size ( i.e. less then 32 bits). So what do you all think? Any big problems I didn't mention? Or solutions to the problems I mentioned? Or papers about this kind of thing (since my google-fu has not found too much on the subject so far)? Thanks in advance, Jason _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
On 8/4/07, Jason Johnson <[hidden email]> wrote:
There's a trade-off here: either small-integer arithmetic is made a little slower, or access to all objects is made a little slower. Nothing's free (unless an entirely different scheme is proposed). The reason for the current system is that it was built to run on the Motorola M68000 series of CPUs which were used in Apple's machines of the time. These CPUs, like most today, required aligned access, meaning that to read or write a 4-byte integer value in memory, it has to be stored at an address that is a multiple of 4. Actually, I seem to remember that the M68k required an alignment of only 2 bytes, but I could be wrong. Anyway. Suffice to say that to access an n-byte sized integer or floating value on modern hardware, it needs to be stored at a modulo-n address. The point is that all modern chips require alignment, with the exception of the x86 series, which copes with mis-aligned data but is considerably slower than if it were aligned (something like 10 times, ISTR). Check out http://en.wikipedia.org/wiki/Data_structure_alignment So for a 32-bit machine, we want to store pointer values (OOPs) or numeric values in registers and memory. Now, the hardware makes it a requirement for the lowest 2 bits to be zero for pointers to aligned (32-bit) data. So already, pointers are "tagged" by the hardware: any value with the bottom bit set is by definition not a valid pointer. The authors of squeak spotted this property and came up with the current scheme. There's another point to keep in mind with your scheme: what about address space? On my SPARC box, pointers can be smallish values: apg@breakfast: ~ cat t.c #include <stdio.h> #include <stdlib.h> int main( int argc, char * argv[] ) { char * p = (char*) malloc( 23 ); printf( "Address is %p\n", p ); } apg@breakfast: ~ gcc t.c apg@breakfast: ~ ./a.out Address is 20d60 However, I'm pretty sure that this number could just as easily be enormous, but your scheme would limit it to 2G of address space. It would also require setting the top-most bit on OOP-creation, testing it on access, and converting to a memory address. A test and shift for integers sounds simpler and cheaper (to me anyway). The current scheme can provide 31-bit integers as well as a 4GB address space for other objects, and at a pretty low cost. And 4GB is already a bit of a limitation -- I was involved in a project which needed squeak to work with 6+GB of objects. Anyone know how well the 64-bit Squeak works these days? Andrew _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
In reply to this post by Jason Johnson-5
Jason Johnson writes:
> Hello all, > > As I have mentioned before, I am thinking about creating a new VM for > testing some of my ideas. One thing I noticed in the blue book was the > tagging they use on the first (least significant) bit. By doing this you > always have to shift the number before you can work with it. You don't need to shift the numbers for addition and subtraction, just make sure the bottom bits are 0. Even with the tag being 1, you only need to detag one argument and if that's a constant the detagging can be done at compile time. With multiplication and division only one argument needs to be shifted. > My thought was moving the tag to the last bit (most significant) and making > 0 mean SmallPositiveInteger and 1 for OOP's. This way, once the number is > identified to be a number nothing more need be done to use it and it doesn't > need to be converted to be put back. > > As far as negative numbers, if I use a 31 bit header for my OOPS then the > first bit of the header can be a flag. If the flag is 1 it's actually a > negative number, otherwise it's a normal header. > > To clarify what I mean, here are some examples (first half byte in binary, > rest in hex) > > 0000-0 00 00 ff - SmallPositiveInteger(255) > 0111-f ff ff ff - SmallPositiveInteger(2147483647) > > 1xxx-x xx xx 01 - OOP(random memory address) > 1xxx-x xx xx 02 - OOP(random memory address) > 1xxx-x xx xx 03 - OOP(random memory address) > > (at address 0xxx-x xx xx 01) > 1000-0 00 00 00 - SmallNegativeInteger(-2147483647) > > (at address 0xxx-x xx xx 02) > 1111-f ff ff ff - SmallNegativeInteger(-1) > > (at address 0xxx-x xx xx 03) > 0xxx-x xx xx xx - OO Header for a regular object > > So does anyone know any problems with this approach that make it > unusable/unpractical? The only things I see are: It was used by some Lisp systems in the mid 80's. I think CMU Common Lisp from memory. There are two disadvantages of using the top bit: 1) You need to control the address space to make sure all objects live. Controlling the address space to that degree is unlikely to be something that can be done portably. 2) You loose half your address space. > 1) This representation is tied to 2's compliment, but are there any system > that don't use that? > 2) This system can only deal with 2 gigs worth of address pointers (can be > hard coded to high or low but not both). How does any Smalltalk get around > this? Alignment. Words are aligned to 4 byte boundaries which is always required for performance and required by some CPU architectures. > 3) Negative numbers require indirect memory access. Are negative numbers > used so much this would be a problem? If so, one option would be to switch > from 31 bits to 30 and use 2 bits for the tag (00 = SmallPositiveInteger, 11 > = SmallNegativeInteger, 01 = Float?) I think VW is using two tag bits. > 4) By tagging on the high side I have no option of using a value less then > 32 bits (e.g. an 8-byte char). Perhaps characters can be done as my initial > negative number solution as well by having a special OO header to represent > them. Since the pointer points to the top, not the bottom maybe the header > could be variable size (i.e. less then 32 bits). > > So what do you all think? Any big problems I didn't mention? Or solutions > to the problems I mentioned? Or papers about this kind of thing (since my > google-fu has not found too much on the subject so far)? There is a large literature on garbage collection. These day's I'd try citeseer as well as Google. Scanning PLDI and OOPSLA proceedings from the late 70s and 80s will provide plenty of information if you've either got ACM membership or access to a university library. Bryce _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
Thanks Bryce and Andrew for your quick and informative responses. Comments/questions below.
On 8/4/07, [hidden email] <[hidden email]> wrote:
Interesting. Well, I assume that in Smalltalk object access is much more often then integer access, but does anyone have any data that proves this? If it is the case then I guess that's the death of my MSB tagging idea. :) Doesn't Strongtalk use 0 tagged SmallIntegers? And if so, why? I looked in the source code a bit but I haven't figured out yet where that would be.
Well, I suppose I could shift the pointers up one (which would make the first bit zero) to reclaim the space, but as I believe it must be the case that object access is much more common then number access, that would be a bad trade-off. Alignment. Words are aligned to 4 byte boundaries which is always I see, I didn't realize the bottom 2 bits had to be zero. That answers a lot of my questions. :) There is a large literature on garbage collection. These day's I'd try I have the "Garbage Collection" book from Jones & Lins, but I will try to look on those sites as well. I just need to go to the "ACM digital library", no? _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
I'd doubt that using the top bits has any significant advantages over using the bottom bits with a zero integer tag. Using the top bits adds the problem of getting the OS to allocate memory to fit around that policy. Jason Johnson writes: > Thanks Bryce and Andrew for your quick and informative responses. > Comments/questions below. > > On 8/4/07, [hidden email] <[hidden email]> wrote: > > > > > > You don't need to shift the numbers for addition and subtraction, just > > make sure the bottom bits are 0. Even with the tag being 1, you > > only need to detag one argument and if that's a constant the > > detagging can be done at compile time. > > > > With multiplication and division only one argument needs to be > > shifted. > > > > Interesting. Well, I assume that in Smalltalk object access is much more > often then integer access, but does anyone have any data that proves this? > If it is the case then I guess that's the death of my MSB tagging idea. :) > > Doesn't Strongtalk use 0 tagged SmallIntegers? And if so, why? I looked in > the source code a bit but I haven't figured out yet where that would be. Yes. The trick is detagging pointers can be done using offet addressing so that's free as well as detagging integers for addition and subtraction. So by using a zero tag there's no overhead for common operations. There's a single shift for multiplication and division but both these operations can be slow. > > > It was used by some Lisp systems in the mid 80's. I think CMU Common > > Lisp from memory. > > > > There are two disadvantages of using the top bit: > > 1) You need to control the address space to make sure all objects > > live. Controlling the address space to that degree is unlikely to > > be something that can be done portably. > > 2) You loose half your address space. > > > > Well, I suppose I could shift the pointers up one (which would make the > first bit zero) to reclaim the space, but as I believe it must be the case > that object access is much more common then number access, that would be a > bad trade-off. > > > Alignment. Words are aligned to 4 byte boundaries which is always > > required for performance and required by some CPU architectures. > > > > I see, I didn't realize the bottom 2 bits had to be zero. That answers a > lot of my questions. :) > > There is a large literature on garbage collection. These day's I'd try > > citeseer as well as Google. Scanning PLDI and OOPSLA proceedings from > > the late 70s and 80s will provide plenty of information if you've > > either got ACM membership or access to a university library. > > > > Bryce > > > > I have the "Garbage Collection" book from Jones & Lins, but I will try to > look on those sites as well. I just need to go to the "ACM digital > library", no? ACM digital library should work. I'm not a member so can't say for sure. Most papers are availible for free for recent stuff and the key classics are slowly appearing too. Bryce _______________________________________________ Exupery mailing list [hidden email] http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/exupery |
Free forum by Nabble | Edit this page |