Hello everyone, I have a small question about NativeBoost : How does the "+" operator when applied to a pointer translates into NativeBoost code ?Because a bit of actual code is easier to understand here is what I'd like to do in Pharo : ... int i, j; int *data = malloc(1000*sizeof(int)); int *newData = malloc(50*sizeof(int)); // Allocate initial data for (i = 0 ; i < 1000, i++) { data[i] = i; } //Copy desired chunks into new buffer for (i = 0; i < 5; i++ ) { memcpy( newData + j*10, data + 200 + j*30, 10*sizeof(int)); j++; } free(data); ... Here basically I'll get in my buffer chunks of 10 integers starting at 200 with an offset of 30 between chunks, and this 5 times. (200 201 202 ... 208 209 230 231 ... 238 239 260 ... 328 329). I am okay with the malloc, memcpy and free but I don't know how to handle the "+" operator in my memcpy function. Thank you, Matthieu |
> On 08 Jun 2015, at 4:41 , Matthieu Lacaton <[hidden email]> wrote: > > Hello everyone, > > I have a small question about NativeBoost : How does the "+" operator when applied to a pointer translates into NativeBoost code ? > > To give a bit of context, what I want to do is to reallocate some non-contiguous bytes in memory to a buffer. Basically, I have an array of integers in a buffer and I want to copy some chunks of it in another buffer. The chunks are always the same size and the offset between each chunk is always the same too. > > Because a bit of actual code is easier to understand here is what I'd like to do in Pharo : > > ... > > int i, j; > int *data = malloc(1000*sizeof(int)); > int *newData = malloc(50*sizeof(int)); > > // Allocate initial data > for (i = 0 ; i < 1000, i++) { > data[i] = i; > } > > //Copy desired chunks into new buffer > for (i = 0; i < 5; i++ ) { > memcpy( newData + j*10, data + 200 + j*30, 10*sizeof(int)); > j++; > } > > free(data); You can do relative addressing like this: (destReg ptr: dataSize) + offsetReg + constant So with offSetRegs containing j* 10 and j* 30, you might end up with an unrolled inner loop (barring using any fancier longer-than-int moves) like: 0 to: 9 do: [:constantOffset | asm mov: (destReg ptr: currentPlatform sizeOfInt) + dstOffsetReg + constantOffset with: (srcReg ptr: currentPlatform sizeOfInt) + 200 + srcOffsetReg + constantOffset] If the range of j is constant, you can just as easily unroll the whole thing in a similarly compact fashion, space and sensibilites permitting: 0 to: 4 do: [ :j | 0 to: 9 do: [ :consOffset | asm mov: (destReg ptr: currentPlatform sizeOfInt) + (j* 10) + constOffset with: (srcReg ptr: currentPlatform sizeOfInt) + 200 + (j * 30) + constOffset] Cheers, Henry |
Hello Henrik, Thank you very much for your answer. However, the code you provided is some sort of assembly right ? So does it mean that I need to learn assembly to do what I want ?2015-06-08 19:56 GMT+02:00 Henrik Johansen <[hidden email]>:
|
There are many ways to Rome :)
If you just need some externally allocated objects in the formats you specified you can do the cache extraction using nothing but normal Smalltalk: intArray := (NBExternalArray ofType: 'int'). data := intArray new: 1000. 1 to:data size do:[:i |data at:i put: i]. cache := intArray new: 50. 0 to: 4 do: [:j | 1 to: 10 do: [ :k | cache at: (j* 10) + k put: (data at: 199 + (30 * j ) + k)] ]. But if you want to take full advantage of the performance boost NB offers, you'd write a NativeBoost function to do the cache extraction*, as I outlined last time: MyClass class >> #createCacheOf: aSource in: aDestination createCacheOf: aSource in: aDestination <primitive: #primitiveNativeCall module: #NativeBoostPlugin> "Should work on both x86 and x64, as long as sizeOf: lookups work correctly" ^ self nbCallout function: #(void (int * aSource, int * aDestination) ) emit: [:gen :proxy :asm | |destReg srcReg tmpReg intSize ptrSize| intSize := NBExternalType sizeOf: 'int'. ptrSize := NBExternalType sizeOf: 'void *'. "Only use caller-saved regs, no preservation needed" destReg := asm EAX as: ptrSize. srcReg := asm ECX as: ptrSize. tmpReg := asm EDX as: intSize. asm pop: srcReg. asm pop: destReg. 0 to: 4 do: [ :j | 0 to: 9 do: [ :offset | asm "Displacement in bytes, not ptr element size :S, so we have to multiply offset by that manually :S" mov: tmpReg with: srcReg ptr + (199 + (j * 30) + offset * intSize); mov: destReg ptr + ((j* 10) + offset * intSize) with: tmpReg]]] and use that; intArray := (NBExternalArray ofType: 'int'). data := intArray new: 1000. 1 to:data size do:[:i |data at:i put: i]. cache := intArray new: 50. MyClass createCacheOf: data in: cache. The difference using a simple [] bench is about two orders of magnitude; 11million cache extractions per seconds for the inline assembly version, while the naive loop achieves around 110k. Cheers, Henry *as: is not yet defined, could be something like: AJx86GPRegister >> #as: aSize ^ self isHighByte ifTrue: [ self asLowByte as: aSize ] ifFalse: [ AJx86Registers generalPurposeWithIndex: self index size: aSize requiresRex: self index > (aSize > 1 ifTrue: [7] ifFalse: [ 3]) prohibitsRex: false ]
|
Forgot to change this; you need to pass in the ExternalArray addresses as parameters, not the ExternalArrays themselves. MyClass createCacheOf: data address in: cache address Cheers, Henry
|
In reply to this post by Matthieu
As i understand, in general, the problem that you described is in following: In smalltalk you cannot reference an element of array, only the object (array in that case) as a whole. The reason why it like so, because VM moves objects around, and you cannot control directly when that happens, and also VM responsible for updating all pointers (references) to moved object(s) for all interested parties (which could be other objects, stack etc) , making sure all references remain consistent upon such move. So, with such constraints, the only way to validly point to an element inside array would be to store two values separately: - a reference to an object, that represent your buffer (which VM would update at will) - an index (or offset) in that object, pointing to element in your buffer Unfortunately, this is the only way how we could implement such, lets say 'ElementPointer' safely. Which then can be used to pass to C function(s), converting object reference + offset into simple address just before invoking a function (and sure thing, knowing that there's no chance triggering GC, else it will turn into pointer to wrong place, but that's general problem of passing pointers on object memory heap, not just exclusively for 'element pointer' and such). For buffers allocated externally, e.g. outside heap governed by VM, there's nothing prevents you from having an address that pointing inside some buffer (or even outside it :) For NBExternalAddress: addr := self allocate: somespace. newAddr := NBExternalAddress value: addr value + someoffset. or newAddr := addr copy value: addr value + someoffset sure, it is up to you then, how to calculate offsets and buffer size(s) as well as allocating/deallocating memory for buffers you using. On 8 June 2015 at 16:41, Matthieu Lacaton <[hidden email]> wrote:
-- Best regards,
Igor Stasenko. |
In reply to this post by Henrik Sperre Johansen
Henrik
you amaze me :) Stef Le 9/6/15 14:59, Henrik Johansen a
écrit :
There are many ways to Rome :) |
In reply to this post by Igor Stasenko
@ Igor
Yes ! Exactly that. I'm bad at explaining things :(
Alright, thank you very much for your explanations ! By the way, is there a way to disable the GC for a short period of time and then re-enable it ? I am not sure I understand every bit of your code right now but I will definitely study it because it looks awesome. Moreover, performance is quite important for me so your solution is very attractive and I'll try to use it. Thanks a lot ! I find it both fun and amazing what you can do with Pharo. I never thought I would do assembly inside Pharo ! Again, a big thanks to both of you, Cheers, Matthieu 2015-06-09 17:43 GMT+02:00 Igor Stasenko <[hidden email]>:
|
On 9 June 2015 at 20:05, Matthieu Lacaton <[hidden email]> wrote:
me too, sometimes. :)
Well, some aspects of GC behavior can be controlled, but they serve rather for fine tuning or picking the strategy ahead of time, knowing, what application is going to run. So, at application level, you can use them.. but not at the level of library/framework (like in case of NB), because there's no way to determine what/where will be used, and so, fiddling with GC is worst possible way to solve the problem :) Also, in general, it would be a bad practice to rely on subtle and fuzzy details of GC triggering logic, because it is one of the most sophisticated parts of VM and subject of future changes. So, instead relying on implementation details, a new contract between VM and language side is introduced and it called 'object pinning'. So, that pinned objects are no longer a subject of relocation in memory. It means that you will be able to control, that chosen object(s) will be not relocated in memory, regardless how often VM triggers GC and what is involved. And that comes with Spur.
-- Best regards,
Igor Stasenko. |
Free forum by Nabble | Edit this page |