Hello all,
sorry for cross-posting. :) I'd like to ask you, if anyone could share either an image or installation with application, which using Magma OODB. I'd like to use it & test how changing different aspects of Magma internals could affect the performance. There's many tricks, which is known by Chris how to speed it up by cleverly fine-tuning various Magma options, like read strategy etc. But what i'd like is to see, is some setup, used by people, and by taking it, see how it could make run faster, without changing an application code. I remember, someone gave a talk @ ESUG, that they were using Magma for their application, but then forced to switch to another DB layer, because they had bad performance issues. It would be good, if you could give me the code, so i can run it and see if things could be improved. Its not a problem, if code is not open-source, we could sign an NDA, if this is necessary. I need something real, simply because benchmarks sometimes not representative. :) -- Best regards, Igor Stasenko AKA sig. |
The question is "Magma could be used in a real-world application?"
I think not. I used Magma a couple of years ago. I had to migrate 13000 records from a database, build objects form the data and find duplications. The whole process of migration to Magma took more than two weaks. Yes, weaks!. 15 days!. And I had to made a lot of strange things to reduce time. I made periodical cleaning up process in the middle of the whole migation to improve performance. The first attemps tend to infinite, I reduce it to two weaks. The application was in production two years, with a lot of performance problems. Finally we have to throw it away. Now I'm trying again. Now I use very simple objects (just a couple of string variables) with only one index for one of this strings . But the problem is once again the same: each time I add an object to a MagmaCollection I have to look after duplicates, and if found, merge them. I'm adding object to 4 MagmaCollections, with an index with size 64. (I need more, but once again, performance...) Look at this numbers I tried with 2000 records (total is 76000). If I just add the objects without search it takes 2 minutes If I search the objects it takes 16 minutes! If I search the objects, and then iterate the results to make a more fine comparision, it takes 20 minutes! In a linear progression, total time for 76000 records would be 12 hours, which is not very good. But time doesn't grow lineary but exponentially. I tried with 12000 records: 5 hours 30 minutes. How must I supose it will take with 76000 records. 2 days? 3? These are not times for a process I surely will perform again as my model grows. The numbers tells, once again, that the problem is not adding the objects, merging or materializing them but searching on a MagmaCollection. Time goes by and this problem is not solved. I'm desperate. I have no way of persist my objects if I want to avoid everybody laughing at me. Norberto On Thu, Oct 21, 2010 at 3:33 AM, Igor Stasenko <[hidden email]> wrote: Hello all, -- Norberto Manzanos Instituto de Investigaciones en Humanidades y Ciencias Sociales (IdIHCS) FaHCE/UNLP - CONICET Calle 48 e/ 6 y 7 s/Nº - 8º piso - oficina 803 Tel: +54-221-4230125 interno 262 |
Norberto
Just thinking aloud .... 76000 records is not that much. Assuming a record is 2kB (2000Bytes) it would 152000kB or 152MB. Which would be fine for an image. So you might do it as a memory database -- e.g. Sandstone. http://www.squeaksource.com/SandstoneDb See in particular the idea mentioned by Dan Ingalls to go for a really big image. In fact an image of let's say 400MB would probably not yet be considered to be big in his terms. What do you think? Any experience reports with images of this size keeping data? Regards Hannes On 11/2/10, Norberto Manzanos <[hidden email]> wrote: > The question is "Magma could be used in a real-world application?" > I think not. > I used Magma a couple of years ago. I had to migrate 13000 records from a > database, build objects form the data and find duplications. The whole > process of migration to Magma took more than two weaks. Yes, weaks!. 15 > days!. And I had to made a lot of strange things to reduce time. I made > periodical cleaning up process in the middle of the whole migation to > improve performance. The first attemps tend to infinite, I reduce it to two > weaks. > The application was in production two years, with a lot of performance > problems. Finally we have to throw it away. > Now I'm trying again. Now I use very simple objects (just a couple of string > variables) with only one index for one of this strings . > But the problem is once again the same: each time I add an object to a > MagmaCollection I have to look after duplicates, and if found, merge them. > I'm adding object to 4 MagmaCollections, with an index with size 64. (I need > more, but once again, performance...) > Look at this numbers > I tried with 2000 records (total is 76000). > If I just add the objects without search it takes 2 minutes > If I search the objects it takes 16 minutes! > If I search the objects, and then iterate the results to make a more fine > comparision, it takes 20 minutes! > > In a linear progression, total time for 76000 records would be 12 hours, > which is not very good. But time doesn't grow lineary but exponentially. > I tried with 12000 records: 5 hours 30 minutes. How must I supose it will > take with 76000 records. 2 days? 3? > These are not times for a process I surely will perform again as my model > grows. > The numbers tells, once again, that the problem is not adding the objects, > merging or materializing them but searching on a MagmaCollection. Time goes > by and this problem is not solved. > > I'm desperate. > I have no way of persist my objects if I want to avoid everybody laughing at > me. > > Norberto > > On Thu, Oct 21, 2010 at 3:33 AM, Igor Stasenko <[hidden email]> wrote: > >> Hello all, >> sorry for cross-posting. :) >> >> I'd like to ask you, if anyone could share either an image or >> installation with application, >> which using Magma OODB. >> I'd like to use it & test how changing different aspects of Magma >> internals could affect the performance. >> >> There's many tricks, which is known by Chris how to speed it up by >> cleverly fine-tuning various Magma options, >> like read strategy etc. >> But what i'd like is to see, is some setup, used by people, and by >> taking it, see how it could make run faster, >> without changing an application code. >> >> I remember, someone gave a talk @ ESUG, that they were using Magma for >> their application, >> but then forced to switch to another DB layer, because they had bad >> performance issues. >> It would be good, if you could give me the code, so i can run it and >> see if things could be improved. >> Its not a problem, if code is not open-source, we could sign an NDA, >> if this is necessary. >> >> I need something real, simply because benchmarks sometimes not >> representative. :) >> >> -- >> Best regards, >> Igor Stasenko AKA sig. >> >> > > > -- > Norberto Manzanos > Instituto de Investigaciones en Humanidades y Ciencias Sociales (IdIHCS) > FaHCE/UNLP - CONICET > Calle 48 e/ 6 y 7 s/Nº - 8º piso - oficina 803 > Tel: +54-221-4230125 interno 262 > |
In reply to this post by Manzanos
On 2 November 2010 15:40, Norberto Manzanos <[hidden email]> wrote:
> The question is "Magma could be used in a real-world application?" > I think not. I think otherwise. Magma should strive to remove any obstacles. > I used Magma a couple of years ago. I had to migrate 13000 records from a > database, build objects form the data and find duplications. The whole > process of migration to Magma took more than two weaks. Yes, weaks!. 15 > days!. And I had to made a lot of strange things to reduce time. I made > periodical cleaning up process in the middle of the whole migation to > improve performance. The first attemps tend to infinite, I reduce it to two > weaks. > The application was in production two years, with a lot of performance > problems. Finally we have to throw it away. > Now I'm trying again. Now I use very simple objects (just a couple of string > variables) with only one index for one of this strings . > But the problem is once again the same: each time I add an object to a > MagmaCollection I have to look after duplicates, and if found, merge them. > I'm adding object to 4 MagmaCollections, with an index with size 64. (I need > more, but once again, performance...) > Look at this numbers > I tried with 2000 records (total is 76000). > If I just add the objects without search it takes 2 minutes > If I search the objects it takes 16 minutes! > If I search the objects, and then iterate the results to make a more fine > comparision, it takes 20 minutes! > > In a linear progression, total time for 76000 records would be 12 hours, > which is not very good. But time doesn't grow lineary but exponentially. > I tried with 12000 records: 5 hours 30 minutes. How must I supose it will > take with 76000 records. 2 days? 3? > These are not times for a process I surely will perform again as my model > grows. > The numbers tells, once again, that the problem is not adding the objects, > merging or materializing them but searching on a MagmaCollection. Time goes > by and this problem is not solved. > > I'm desperate. > I have no way of persist my objects if I want to avoid everybody laughing at > me. > Norberto, i suspect that you doing something wrong. Scanning 76000 records by iterating though entire collection when looking for duplicates is indeed slow. And sure thing, time to add new item will grow exponentially once collection grows. And its not really matters what kind of DB you will use. It will be slow everywhere, if you need to scan entire dataset before adding new item to it. So, i suspect that there is something wrong with the way how you using MagmaCollection, and because of that, performance degrades from O(log(n)) down to O(n), for each operation. Sure, its hard to say anything without looking at actual code. > Norberto > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Manzanos
On Tue, 2 Nov 2010, Norberto Manzanos wrote:
snip > In a linear progression, total time for 76000 records would be 12 hours, > which is not very good. But time doesn't grow lineary but exponentially. I doubt it's exponential, I'm pretty sure it's just quadratic. IIRC there's a test case in Gjallar which pushes 100000 records to the database. IIRC it was pretty slow, like 10 or 30 minutes, but it was based on Sqeuak 3.8 and was nowhere near hours, days or weeks. Now there's a faster VM, faster sets and dictionaries and better finalization available in Squeak and Magma. In december we will have even faster finalization available (via the new official VMs) which should improve Magma's performance. > I tried with 12000 records: 5 hours 30 minutes. How must I supose it will > take with 76000 records. 2 days? 3? > These are not times for a process I surely will perform again as my model > grows. > The numbers tells, once again, that the problem is not adding the objects, > merging or materializing them but searching on a MagmaCollection. Time goes > by and this problem is not solved. Did you try profiling the code to see what causes the slowdown? Did you tune the garbage collector? Are you using CogVM or SqueakVM? Which image do you use? Can you make the code available, so others can test it too? Levente > > I'm desperate. > I have no way of persist my objects if I want to avoid everybody laughing at > me. > > Norberto snip |
In reply to this post by Hannes Hirzel
On Tue, Nov 2, 2010 at 12:52 PM, Hannes Hirzel <[hidden email]> wrote: Norberto In theory, that's what everybody should do. I allways dreamed that.
I tried using collection in memory. They consume much more time. Searching in middle sized collections in memory is impossible. Anyway, the size of the squeak image is a problem I think is not fixed. Image size sometimes grows to much and there's no garbage collecting that can return it to its original size. I had to reconstruct the whole code from the packages several times in order to gain decens of ghost mb. I think Seaside is the problem, but I'm not sure. I think if you use an original big image, these problem should be dramatic. I'll try Sandstone. Thanks for the tip.
-- Norberto Manzanos Instituto de Investigaciones en Humanidades y Ciencias Sociales (IdIHCS) FaHCE/UNLP - CONICET Calle 48 e/ 6 y 7 s/Nº - 8º piso - oficina 803 Tel: +54-221-4230125 interno 262 |
In reply to this post by Igor Stasenko
Hey!!!! I'm not scanning the entire data base. What about indexes? I tried to add all objects before and perform the comparision after, and it takes even more time. I made this before using traditional technology, SQL and all that, and I never have to wait several days for the result.
Ok. Let's see the code "this method creates the repository" createMagmaRepository | thePath theSession | thePath := FileDirectory default pathName assurePath,'\magma'. ((FileDirectory on: thePath) fileExists:'objects') ifFalse:[ (Smalltalk at: #MagmaSession) disconnectAndCloseAllConnectedSessions. (Smalltalk at: #MagmaRepositoryController) delete: thePath. (Smalltalk at: #MagmaRepositoryController) create: thePath root: (IdentityDictionary new at: #works put: self workCatalog; at: #expressions put: self expressionCatalog ; at: #manifestations put: self manifestationCatalog ; yourself). theSession := (Smalltalk at: #MagmaSession) openLocal: thePath. theSession connectAs: 'nor'.]. ^theSession "these method creates the collections (workCatalog, expressionCatalog, manifestationCatalog" initializeCollection collection:= MagmaCollection new. collection addIndex: ((MaSearchStringIndex attribute: #authorizedName) keySize: 64; beAscii) "these two methods add and find the objects" add: anObject | found | found:= self find: anObject. found isNil ifTrue: [self collection add: anObject ] ifFalse: [ self merger merge: anObject on: found ]. ^anObject find: anAccessPoint | found | found:= (self collection where: [:reader reader read: #authorizedName at: anAccessPoint authorizedName]). "authorizedNamed is a string" As I allready said, merging is not the problem, so I omit the methods for merging. Transaction is made using an interator which let you change the transaction step. So, that's not a problem. I'm making commits every 300 addings. "This is the main method" BibHuma instance useMagmaCollections. ferberizator iterator: (MagmaIterator newForIsis list: (1 to: 2000)). "records to migrate to Magma" ferberizator iterator transactionStep: 300. session := MagmaSession openLocal: FileDirectory default pathName assurePath,'magma'. session connectAs: 'nm'. ferberizator iterator session: session. session readStrategy: (MaReadStrategy minimumDepth: 0). ferberizator build
-- Norberto Manzanos Instituto de Investigaciones en Humanidades y Ciencias Sociales (IdIHCS) FaHCE/UNLP - CONICET Calle 48 e/ 6 y 7 s/Nº - 8º piso - oficina 803 Tel: +54-221-4230125 interno 262 |
On 3 November 2010 00:02, Norberto Manzanos <[hidden email]> wrote:
> >> >> Norberto, i suspect that you doing something wrong. Scanning 76000 >> records by iterating though entire collection >> when looking for duplicates is indeed slow. And sure thing, time to >> add new item will grow exponentially once collection grows. >> And its not really matters what kind of DB you will use. It will be >> slow everywhere, if you need to scan entire dataset before adding new >> item to it. > > Hey!!!! I'm not scanning the entire data base. What about indexes? > I tried to add all objects before and perform the comparision after, and > it takes even more time. > I made this before using traditional technology, SQL and all that, and I > never have to wait several days for the result. Okay.. here what i have tried on bleeding edge magma-tester suite and Squeak VM: MagmaRepositoryController create: FileDirectory default pathName , '\foo' root: OrderedCollection new | session | session := MagmaSession openLocal: FileDirectory default pathName , '\foo'. session connectAs: 'nm'. session commit: [ session root add: 1->1 ]. session closeRepository. | session temp time | session := MagmaSession openLocal: FileDirectory default pathName , '\foo'. session connectAs: 'nm'. session commit: [ temp := session root at: 1 ]. time := [ 100 timesRepeat: [ session commit: [ 1000 timesRepeat: [ temp value: ( 1->1). temp := temp value. ] ]]] timeToRun. session closeRepository. time 53704 ~54 seconds for populating 100000 unique objects. Now, with magma collection: | session | session := MagmaSession openLocal: FileDirectory default pathName , '\foo'. session connectAs: 'nm'. session commit: [ session root at: 1 put: MagmaCollection new ]. session closeRepository. | session temp time | session := MagmaSession openLocal: FileDirectory default pathName , '\foo'. session connectAs: 'nm'. session commit: [ temp := session root at: 1 ]. time := [ 100 timesRepeat: [ session commit: [ 1000 timesRepeat: [ temp add: ( 1->1) ] ]]] timeToRun. session closeRepository. time 116307 Again, 116 seconds. Okay, what about magma collection with index? | session coll | session := MagmaSession openLocal: FileDirectory default pathName , '\foo'. session connectAs: 'nm'. session commit: [ coll := MagmaCollection new. coll addIndex: ((MaIntegerIndex attribute: #key) keySize: 64). session root at: 1 put: coll ]. session closeRepository. | session temp time i | session := MagmaSession openLocal: FileDirectory default pathName , '\foo'. session connectAs: 'nm'. session commit: [ temp := session root at: 1 ]. i := 0. time := [ 100 timesRepeat: [ session commit: [ 1000 timesRepeat: [ temp add: ( i -> 1). i := i +1. ] ]]] timeToRun. session closeRepository. time 263306 263 seconds. Which is what i expected. Maintaining extra index doubles population time. Still, i'd prefer to see it 10-50 times faster. But as you can see, it doesn't takes hours. Later i will try to complicate the test with adding a 'uniqueness' check, using magma reader, in same way as you do. Meanwhile, can you run this code and tell what is your numbers? -- Best regards, Igor Stasenko AKA sig. |
They are really different. 1) Yours: 53704. Mine: 6615 2) Yours: 116307 Mine: 290471 3) yours: 263306 mine: 648078 Image is Squeak 3.9. Are you testing with 4.1 or Pharo? Anyway, these are acceptable numbers too. I think the objects you are adding are excesively simple. I have just tried with objects a little more complex, but results were good again. I don't know what to think. There's nothing really different between these tests and my real-world objects. I'll try to isolate the problem and test it again. Thanks -- Norberto Manzanos Instituto de Investigaciones en Humanidades y Ciencias Sociales (IdIHCS) FaHCE/UNLP - CONICET Calle 48 e/ 6 y 7 s/Nº - 8º piso - oficina 803 Tel: +54-221-4230125 interno 262 |
On 3 November 2010 05:35, Norberto Manzanos <[hidden email]> wrote:
> >> >> >> Meanwhile, can you run this code and tell what is your numbers? >> > > They are really different. > > 1) Yours: 53704. Mine: 6615 ohh.. i just rerun that code and got different numbers: 3475 3160 > 2) Yours: 116307 Mine: 290471 > 3) yours: 263306 mine: 648078 so, i conclude that your image/VM/PC is about 2x times slower than mine. Still, proportion between numbers are the same. > Image is Squeak 3.9. Are you testing with 4.1 or Pharo? 4.2 with VM new finalization > Anyway, these are acceptable numbers too. yes. > I think the objects you are adding are excesively simple. > I have just tried with objects a little more complex, but results were good > again. Yes it should, of course, slow things down, when you adding bigger objects, proportional to space they using. But not by order(s) of magnitude. > I don't know what to think. There's nothing really different between these > tests and my real-world objects. > I'll try to isolate the problem and test it again. > Aren't we isolated it already? Unless its very ineffective MagmaCollectionReader implementation, the only thing which looking suspicious in your code is following: session readStrategy: (MaReadStrategy minimumDepth: 0). > Thanks > > > -- > Norberto Manzanos > Instituto de Investigaciones en Humanidades y Ciencias Sociales (IdIHCS) > FaHCE/UNLP - CONICET > Calle 48 e/ 6 y 7 s/Nº - 8º piso - oficina 803 > Tel: +54-221-4230125 interno 262 > > > > -- Best regards, Igor Stasenko AKA sig. |
With this code I could improve performance considerably. Most of the time, my program doesn't need to materialize objects, so I tried this kind of null strategy and it worked fine. Regards |
I want to start talking about Magma on the Magma list again. There
have been many improvements since the last release which I am preparing into a new release for this year which I hope will alleviate the remaining performance issues. It won't be "faster", but it will cope with degradation issues like the finalizationProcess. Noberto, it happens that what you are doing really aggravates the finalization problem. Igor or I can help you fix that. Also, using a MagmaCollection to access one object at a time is not a good deal; because the MagmaCollections were designed for relational-style "end-user query" type of access, where you can get lists of several objects from one where: query. Magma will do much better with an object-model for other kinds of access such as your lookup/merge behavior. Also, by having set your readStrategy to a minimumDepth of 0, you are making a trip to the repository just to bring back _one object_ at a time. That is not a good deal. That even further compounds performance problem, because that one object comes back typically pointing to all proxies. So any access to your model at all is going to invoke proxies at every level, which forces Squeak into a lot of become:'s. Consider how slow Squeak's become is: [ Object new becomeForward: Object new copyHash: false ] bench So I think you are hitting on all cylinders of the worst of Magma's worst. :-/ I'd like to help. Please feel free to mail to the Magma mailing list, if you like, I would be interested to see if we can get Magma going for you.. - Chris On Wed, Nov 3, 2010 at 8:53 AM, Norberto Manzanos <[hidden email]> wrote: > >> >> the only thing which looking suspicious in your code is following: >> >> session readStrategy: (MaReadStrategy minimumDepth: 0). >> >> > > With this code I could improve performance considerably. Most of the time, > my program doesn't need to materialize objects, so I tried this kind of > null strategy and it worked fine. > > Regards > > > > |
Free forum by Nabble | Edit this page |