You might recall my building a tool for online testing
with Iliad and GNU Smalltalk. Today was "the" day, and here is what happened. I had gst running on one of my webservers, students accessed it through an apache proxy set up to serve static files and forward the remaining requests to the app. As testers we had 5 classes with about 25 students each. The test they took had 13 exercises with a total of 95 questions. The clients were about 25 PCs in a private subnet, going through a NAT-ing DSL router with something between 200 kbit/s and 500 kbit/s upstream and 2 Mbit/s and 6 Mbit/s downstream (IIRC), shared with a few other users across the rest of the school grounds. Response time was snappy, iliad's "ajax loader" mostly only flashed in the corner, without a chance of showing off the animation. From the users' point of view, it was a nice change of things, and they quite liked it, once we got over the initial hurdle (below). I watched the app through atop and paid attention to the RSIZE, i.e. the amount of physical RAM the gst-remote process was using 1) after startup, before any access: 19 MB 2) after displaying the registration page with a single widget for 25 clients: 33 MB 3) after everybody started working, i.e. with 25 full blown test widgets: 65 MB 4) after finishing the test: between 100 MB and 110 MB I have saved the images of every run gst-remote --eval="ObjectMemory snapshot" and will try to find out what has caused this growth. CPU load on the server varied upto 90% during the initial phase (more below) and upto 30% during test execution. The numbers are not really reliable, as the machine is hosting a few other apps, which might have contributed. The machine is a "small" single core 64 bit AMD64 3700+ with 1GB RAM. Now for the interesting problem, that managed to make me a bit nervous... and would have been a total showstopper, had it not happened in this "experimental testing session". The students of class 6a logged into their Windows domain accounts, started Firefox and entered the URL for the test (stage 1 above). Then they entered their names into the registration page (stage 2) and clicked on the button to access the test. Shortly after server CPU load went to 100% with the following error message being repeated as fast as the remote terminal could cope with: "Socket accept error: Error while trying to accept a socket connection" Client side a one-liner 500 error message was reported. Time for pkill gst-remote ... I rebuilt the image and started the server again. This time we staged the 25 "almost simultaneous" login attempts into four batches of 6 each and things worked fine from that point on. After finishing the test, the students logged off and the next class, 6b ... had the exact same experience ... and 6c and 6d, too. For the final group I tried a different approach: They logged on, opened the URL, and sat on their hands. I killed gst-remote, rebuilt the image, restarted gst-remote and told them to reload the page. They then entered their names and started clicking on the answers and the Socket error of Doom appeared again. Kill, rebuild, restart. Everybody loads the registration page (not staged, just 25 students clicking when they're ready), enters their name and works on the test as it should be. No hiccup. I am very open to any suggestions as to what could have caused this misbehavior. I don't think iliad is concerned (besides generating ajax requests), so swazoo and the gst socket implementation are my next suspects. My vague-feeling-in-the-gut-proved-by-handwaving hypothesis is that it might be the combination of building a fairly large widget tree *and* creating a bunch of new socket connections at the same time that's causing the trouble. I'll try to build a test bed to reproduce the disaster in a controlled setting, but it will be a few days before I can really get to this. Any ideas? s. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On 07/16/2009 03:33 PM, Stefan Schmiedl wrote:
> You might recall my building a tool for online testing > with Iliad and GNU Smalltalk. Today was "the" day, and > here is what happened. Thanks for the experience! > From the users' point of view, it was a nice change > of things, and they quite liked it, once we got over > the initial hurdle (below). What were they using before? > I have saved the images of every run > gst-remote --eval="ObjectMemory snapshot" > and will try to find out what has caused this growth. What is the actual size of the images? > The students of class 6a logged into their Windows domain accounts, > started Firefox and entered the URL for the test (stage 1 above). > Then they entered their names into the registration page (stage 2) > and clicked on the button to access the test. Shortly after server > CPU load went to 100% with the following error message being repeated > as fast as the remote terminal could cope with: > > "Socket accept error: Error while trying to accept a socket connection" Do you have the backtrace? > I'll try to build a test bed to reproduce the disaster in a controlled > setting, but it will be a few days before I can really get to this. > > Any ideas? You could use Ruby's Mechanize for that. http://mechanize.rubyforge.org/mechanize/ -- just to show we're not zealots! :-) Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Fri, 17 Jul 2009 13:12:45 +0200
Paolo Bonzini <[hidden email]> wrote: > Thanks for the experience! I left out the horrific part where the usual last minute fix from the night before broke half of the system ... > > > From the users' point of view, it was a nice change > > of things, and they quite liked it, once we got over > > the initial hurdle (below). > > What were they using before? usual school equipment: paper and pencils :-) > > I have saved the images of every run > > gst-remote --eval="ObjectMemory snapshot" > > and will try to find out what has caused this growth. > > What is the actual size of the images? stefan@g128 testdrive $ ls -l ../ot.im *.im -rw-r--r-- 1 stefan stefan 6234160 16. Jul 17:14 ../ot.im <- "clean image" -r--r--r-- 1 stefan stefan 76702080 16. Jul 09:30 ot6a.im <- "used image" -r--r--r-- 1 stefan stefan 92434208 16. Jul 10:35 ot6b.im <- "special case" -r--r--r-- 1 stefan stefan 74360512 16. Jul 11:24 ot6c.im <- "used image" -r--r--r-- 1 stefan stefan 79733448 16. Jul 11:59 ot6d.im <- "used image" -r--r--r-- 1 stefan stefan 79306448 16. Jul 12:33 ot6e.im <- "used image" The "special case" was a group of students who were sitting there somewhat longer, probably clicking around some more. > > The students of class 6a logged into their Windows domain accounts, > > started Firefox and entered the URL for the test (stage 1 above). > > Then they entered their names into the registration page (stage 2) > > and clicked on the button to access the test. Shortly after server > > CPU load went to 100% with the following error message being > > repeated as fast as the remote terminal could cope with: > > > > "Socket accept error: Error while trying to accept a socket > > connection" > > Do you have the backtrace? There's not much in there. I only have two logs, one contains: Loading package Sockets Loading package Sport Loading package Iconv Loading package Iliad-Core Loading package Iliad-More-UI Loading package Magritte Loading package Iliad-More-Widgets Loading package Iliad-More-Magritte Loading package Swazoo Loading package Iliad-More-Examples Loading package Iliad-More Loading package Iliad-Swazoo Loading package Iliad Loading package OnlineTester gst-remote server started. Socket accept error: Error while trying to accept a socket connection. ... repeated 34461 times until pkill gst-remote ... /usr/local/share/smalltalk/kernel/../scripts/Remote.st:281: Terminated (ip 12)HomedAssociation(Association)>>#= (ip 38)Set>>#findIndex: (ip 6)Set(HashedCollection)>>#findIndexOrNil: (ip 4)Set(HashedCollection)>>#remove:ifAbsent: (ip 8)Set(Collection)>>#mourn: (ip 6)HomedAssociation>>#mourn (ip 4)[] in ProcessorScheduler>>#initialize (ip 24)Array(SequenceableCollection)>>#do: (ip 6)[] in ProcessorScheduler>>#initialize (ip 4)[] in BlockClosure>>#newProcessWith: (ip 52)[] in Process>>#onBlock:at:suspend: (ip 10)<unwind> BlockClosure>>#on:do: (ip 14)[] in Process>>#onBlock:at:suspend: (ip 4)<unwind> BlockClosure>>#ensure: (ip 10)[] in Process>>#onBlock:at:suspend: (ip 42)[] in BlockClosure>>#asContext: (ip 12)BlockContext class>>#fromClosure:parent: and the other ends with the "Remote.st:281: Terminated" line after 40417 "Socket accept error" lines. In that case, I'm fairly sure that the time between the first error and the pkill was below 1 min. > > I'll try to build a test bed to reproduce the disaster in a > > controlled setting, but it will be a few days before I can really > > get to this. > > > > Any ideas? > > You could use Ruby's Mechanize for that. > http://mechanize.rubyforge.org/mechanize/ -- just to show we're not > zealots! :-) huh ... ruby, gst, what's the difference? ducking and covering, s. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
I really should make a blog post out of this ... heck it's enough
for another mini-series :-) Executive Summary: There are MANY objects in the image related to Iliad. How do we get rid of them? On Fri, 17 Jul 2009 15:09:21 +0200 Stefan Schmiedl <[hidden email]> wrote: > > What is the actual size of the images? > > stefan@g128 testdrive $ ls -l ../ot.im *.im > -rw-r--r-- 1 stefan stefan 6234160 16. Jul 17:14 ../ot.im <- "clean image" > -r--r--r-- 1 stefan stefan 76702080 16. Jul 09:30 ot6a.im <- "used image" > -r--r--r-- 1 stefan stefan 92434208 16. Jul 10:35 ot6b.im <- "special case" > -r--r--r-- 1 stefan stefan 74360512 16. Jul 11:24 ot6c.im <- "used image" > -r--r--r-- 1 stefan stefan 79733448 16. Jul 11:59 ot6d.im <- "used image" > -r--r--r-- 1 stefan stefan 79306448 16. Jul 12:33 ot6e.im <- "used image" ot.im is the preloaded image, my OnlineTester package, depending on Iliad. The other images were saved via gst-remote --eval="ObjectMemory snapshot" after finishing the test session. I ran a bit of code on each of these images: Object subclassesDo: [ :c | |s| s := c allInstances size. s > 0 ifTrue: [ c name print. ': ' display. s printNl ] ] Then I realized, that I was missing all the interesting classes, but manually comparing allSubclassesDo output is really too much... so let's gain some hacking points. Collect the "clean" state in ot.im: usage := IdentityDictionary new. Object allSubclassesDo: [ :cls | |count| count := cls allInstances size. usage at: cls put: count ]. s := FileStream fopen: 'counts.dump' mode: #write. [ ObjectDumper dump: usage to: s ] ensure: [ s close ] The dictionary usage contains 1237 items and we can now compare it to the "used" state in the other images: s := FileStream fopen: 'counts.dump' mode: #read. [ usage := ObjectDumper loadFrom: s ] ensure: [ s close ] Object allSubclassesDo: [ :cls | |count orig| count := cls allInstances size. orig := usage at: cls ifAbsent: [ 0 ]. ( count > 1 and: [count > (orig * 2)] ) ifTrue: [ cls display. ': ' display. orig print. ' -> ' display. count printNl. ] ]. ot6a.im has run with an error in the application code, so the data might be a little bit off, let's look at some impressive changes in ot6b.im. I've grouped them into some chunks of related (in my point of view) classes. "My" object counts match expections: OnlineTester.OTApplication: 0 -> 27 OnlineTester.OTAufgabe: 0 -> 13 OnlineTester.OTFrage: 0 -> 95 OnlineTester.OTRegisterWidget: 0 -> 27 OnlineTester.OTTestWidget: 0 -> 27 OnlineTester.OTAufgabeWidget: 0 -> 351 = 13 * 27 OnlineTester.OTFrageWidget: 0 -> 2565 = 95 * 27 (I guess) Swazoo and Socket objects don't seem to be out of range, either, considering the Process count. Process: 5 -> 258 Swazoo.ContentTypeField: 0 -> 246 Swazoo.GenericHeaderField: 0 -> 2704 Swazoo.HTTPAcceptField: 0 -> 246 Swazoo.HTTPConnection: 0 -> 246 Swazoo.HTTPConnectionField: 0 -> 492 Swazoo.HTTPContentLengthField: 0 -> 246 Swazoo.HTTPDateField: 0 -> 246 Swazoo.HTTPGet: 0 -> 246 Swazoo.HTTPHeaders: 0 -> 492 Swazoo.HTTPHostField: 0 -> 246 Swazoo.HTTPRefererField: 0 -> 201 Swazoo.HTTPRequestLine: 0 -> 246 Swazoo.HTTPResponse: 0 -> 246 Swazoo.HTTPServerField: 0 -> 246 Swazoo.HTTPSetCookieField: 0 -> 206 Swazoo.HTTPUserAgentField: 0 -> 246 Swazoo.SwazooBuffer: 0 -> 246 Swazoo.SwazooSocket: 0 -> 248 Swazoo.SwazooStream: 0 -> 246 Swazoo.SwazooTask: 0 -> 246 Swazoo.SwazooURI: 0 -> 447 Swazoo.URIResolution: 0 -> 246 Sockets.IPAddress: 0 -> 500 Sockets.ReadBuffer: 0 -> 247 Sockets.ServerSocket: 0 -> 3 Sockets.StreamSocket: 0 -> 246 Sockets.TCPSocketImpl: 0 -> 251 SpIPAddress: 0 -> 2 SpSocket: 0 -> 248 SpTimestamp: 0 -> 492 Iliad object counts seem to be a bit high, except for Session, and they stay that way, even after a garbage collection. Iliad.Session: 0 -> 28 Iliad.Action: 0 -> 16618 Iliad.BreakElement: 0 -> 16613 Iliad.Decorator: 0 -> 2999 Iliad.DivElement: 0 -> 15498 Iliad.FormElement: 0 -> 5166 Iliad.Id: 0 -> 3054 Iliad.InputElement: 0 -> 5166 Iliad.LabelElement: 0 -> 16613 Iliad.RadioButtonElement: 0 -> 16613 Iliad.RawHtmlElement: 0 -> 26945 Iliad.SpanElement: 0 -> 5166 I guess the following counts are closely related to the Iliad objects: BlockClosure: 1254 -> 19541 BlockContext: 304 -> 23255 String: 14635 -> 95456 The FileStream count shows that my apache proxy did not work as expected. Most of those are pointing to static files I had tried to keep out of swazoo's hair. FileStream: 3 -> 33 Some things have gone wrong during usage: Kernel.CoreException: 7 -> 19 Let's look at them: Kernel.CoreException allInstancesDo: [ :e | e signalClass printNl ] Error Exception Halt Iliad.CurrentContext Iliad.DispatchError Iliad.ResponseNotification Kernel.CoreException MessageNotUnderstood Notification Signal SpAbstractError SpError Swazoo.HTTPException Swazoo.SwazooStreamNoDataError SystemExceptions.FileError SystemExceptions.PrimitiveFailed SystemExceptions.ProcessBeingTerminated SystemExceptions.UnhandledException SystemExceptions.UserInterrupt SystemExceptions.VMError The most shocking increases are found in the collection hierarchy: IdentityDictionary: 17 -> 48 let's count the sizes Bag(0:1 1:30 251:1 3:2 2:1 6:1 14:1 13:1 17:1 23:1 22:1 ) What do I need 30 IdentityDictionaries with 1 item for? OnlineTester.OTApplication->an OTApplication ah ... must come from iliad sessions WeakArray: 3 -> 993 hmm... they're gone now ... weaklings, afraid of being examined. HomedAssociation: 3 -> 548 note to self: look up what that is. map temporary stuff to nil Association: 524 -> 178658 Dictionary: 10 -> 108373 the former are stored in the latter and are almost completely iliad stuff OrderedCollection: 60 -> 109188 let's count sizes again Bag(0:70960 32:3 1:27178 97:1 3:5211 2:160 64:1 7:85 4:57 6:57 5:3 288:1 135:1 14:1117 8:199 13:30 9:28 11:4052 10:31 43:2 12:3 576:1 28:1 17:1 92:1 18:1 3246:1 20:1 2304:1 1152:1 ) ooh ... 70k empty OC's the 27k single element OC's have either a RawHtmlElement or a cookie the 5k 3-element OC's are either the mime-type collection ('application/json' ' text/javascript' ' */*' ) or (a DivElement a FormElement a DivElement ) NB the singel OC with 3246 elements contains either classes or (quite often) nil ... what's that? _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
> ot.im is the preloaded image, my OnlineTester package, depending > on Iliad. The other images were saved via > gst-remote --eval="ObjectMemory snapshot" > after finishing the test session. Can you place them somewhere I can download them? They are interesting. > [ ObjectDumper dump: usage to: s ] ensure: [ s close ] Nice :-) > Swazoo and Socket objects don't seem to be out of range, either, > considering the Process count. > Process: 5 -> 258 A bit too many? Maybe something is holding onto them. > Iliad object counts seem to be a bit high, except for Session, > and they stay that way, even after a garbage collection. Then we have to see what's the root for them. Maybe some Process? > Iliad.CurrentContext > Iliad.ResponseNotification > Notification > SystemExceptions.ProcessBeingTerminated These are notifications and are not a problem. > HomedAssociation: 3 -> 548 > note to self: look up what that is. > map temporary stuff to nil Related to weak collections and finalizations. Probably sockets. > Association: 524 -> 178658 > Dictionary: 10 -> 108373 > the former are stored in the latter and are almost > completely iliad stuff 100000 dictionaries are a bit too many. And anyway changing them to LookupTables would save 178658 * 12 bytes for the associations. > OrderedCollection: 60 -> 109188 > let's count sizes again > Bag(0:70960 32:3 1:27178 97:1 3:5211 2:160 64:1 > 7:85 4:57 6:57 5:3 288:1 135:1 14:1117 8:199 > 13:30 9:28 11:4052 10:31 43:2 12:3 576:1 28:1 > 17:1 92:1 18:1 3246:1 20:1 2304:1 1152:1 ) > ooh ... 70k empty OC's 70000 * approx 40 bytes here to save too if you can use lazy initialization... > the 5k 3-element OC's are either the mime-type collection > ('application/json' ' text/javascript' ' */*' ) Probably some result of parsing Accept fields? For many OrderedCollections, adding asArray strategically would save memory at the cost of creating more garbage. Summary: there's definitely something to optimize in code that creates too many collections. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
> Yes, definitely. I'm a bit afraid by all these actions, because I know > that many of them must be obsolete. I think it's time to introduce a new > class, something like ActionRegistry, to manage actions a bit more > finely, per widget probably. That would be nice. Still, congratulations (to both Nico and Stefan, for different reasons) for the usage of Iliad in production. Iliad showed it was way more than a proof of concept, now it's time to optimize it! (Still, I'm curious about all those processes. I'll look at them soon). Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Stefan Schmiedl
A couple more data points to give Nicolas some work.
Let's get a random OC: st> a := (OrderedCollection allInstances at: 14000) OrderedCollection () and let's see where it comes from. Some objects fail to print because they point to files on Stefan's machine, so.. st> (b := a allOwners) collect: [ :e | e class ] (Iliad.RawHtmlElement MethodContext CallinProcess Array Array Array Array Array Array MethodContext MethodContext Association MethodContext CallinProcess BlockClosure MethodContext MethodContext MethodContext ) st> b first inspect An instance of Iliad.RawHtmlElement children: OrderedCollection () attributes: Dictionary ( ) contents: 'falsch' a RawHtmlElement Doh. Here go ~400 bytes: st> b first attributes basicSize 32 st> b first children basicSize 16 13 megabytes: st> Iliad.RawHtmlElement allInstances count: [ :e | e children size = 0 & e attributes size = 0 ] 26945 plus 6 more for Iliad.BreakElement: st> Iliad.Element allSubclasses asArray collect: [ :e | e->(e allInstances count: [ :e | e children size = 0 & e attributes size = 0 ]) ] .... Iliad.BreakElement->16613 .... Regarding children, there are 5166 InputElements and 16613 RadioButtonElement that would save 150 bytes each. and so on. I'll look at Accept later. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
On Mon, 20 Jul 2009 21:34:32 +0200
Paolo Bonzini <[hidden email]> wrote: > st> (b := a allOwners) collect: [ :e | e class ] allOwners ... good thing I didn't start writing a reference finder :-) > st> b first attributes basicSize > 32 is basicSize the RAM allocated for "the object itself", i.e. mostly the pointers referencing other objects? those are two very valuable tools ... thanks. > st> b first children basicSize > 16 > > 13 megabytes: > > st> Iliad.RawHtmlElement allInstances count: [ :e | e children size = 0 > & e attributes size = 0 ] > 26945 > > plus 6 more for Iliad.BreakElement: heh ... 6MB for <br/>? that's about all the breaks I need in a lifetime :-) > I'll look at Accept later. This is great. If Nico's reorganization action works out and if we can further reduce the amount of RAM per element, Iliad will run really well even with scarce resources. I'll be tied up with other work until Wednesday, but hope to look at this again then. Thanks for all your effort, s. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Stefan Schmiedl
Oops, hit send too early.
Let's see where they come from. st> (c := b first allOwners) collect: [ :e | e class ] (OrderedCollection Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array Array ) st> (d := c first allOwners) collect: [ :e | e class ] (Iliad.LabelElement Array Array Array Array Array Array Array ) st> (e := d first allOwners first allOwners) collect: [ :e | e class ] (Iliad.FormElement Array Array Array Array Array Array Array ) st> (f := e first allOwners first allOwners) collect: [ :e | e class ] (Iliad.DivElement Array Array Array Array Array Array Array ) st> (g := f first allOwners first allOwners) collect: [ :e | e class ] (BlockContext BlockContext BlockContext BlockContext Array Array ) st> g first method [] in OnlineTester.OTFrageWidget>>contents st> g second method [] in OnlineTester.OTFrageWidget>>contents st> g third method [] in OnlineTester.OTFrageWidget>>contents st> (h := g first allOwners) collect: [ :e | e class ] (BlockClosure Array ) st> (i := h first allOwners) collect: [ :e | e class ] (Iliad.Action Array Array ) st> (j := i first allOwners) collect: [ :e | e class ] (MethodContext CallinProcess MethodContext MethodContext MethodContext Association BlockClosure BlockClosure BlockClosure BlockClosure BlockClosure BlockClosure MethodContext CallinProcess MethodContext Array ) Uh-oh. Quite a few owners, in particular: st> (j at: 6) '100058'->an Action st> (k := (j at: 6) allOwners) collect: [ :e | e class ] ... st> (k at: 8) class Dictionary st> (k at: 8) size 681 681 actions means that using a Dictionary for them was not bad (just a LookupTable would save memory). Sorry Nicolas for your ActionRegistry. :-) At least, the number of widgets is what you'd expect: st> actions := (k at: 8) asArray st> widgets := (IdentitySet withAll: (actions collect: [:e |e block receiver]) asArray st> widgets size 95 st> widgets first frage aufgabe test aufgaben inject: 0 into: [ :o :a | o + a fragen size ] 95 ... and tells us that the dictionary covered the Actions in a single session as expected (there are 95*27 OTFrageWidgets, 95 per pupil). The number of actions overall is also correct, so there is no leak here: it should be roughly 2*number of answers per quiz (the AJAX request rebuilds the tree and creates new actions), and we get st> widgets first frage aufgabe test aufgaben inject: 0 into: [ :o :a | o + (a fragen size * a antworten size * 2) ] 610 Probably sometimes (s)he tried different answers: :-) st> (widgets collect: [:w| actions count: [:e | e block receiver == w]]) asBag Bag(6:61 4:1 12:7 8:14 16:1 9:11 ) Again, this is for one pupil only. Indeed we have overall over 16000 actions (as pointed out by Stefan in his first analysis), and these had better stay in the Dictionary. Instead for the st> ((Iliad.Element allSubclasses collect: [:e|e allInstances size]) st> fold: [:a :b|a+b]) 64222 elements given the back-of-the-envelope analysis of OrderedCollection and Dictionary usage, we are spending at least ~32 megabytes out of 90 (probably more like 40-50). >> IDEAS: I would probably replace the children OrderedCollection with a tail+next circular linked list representation. At the beginning tail := nil, if it is not nil the head is accessible with tail next. Inserting at the tail is easy. All can be encapsulated into #childrenDo: of course, and it saves a slot compared to head+tail+next. Consider that the OrderedCollection also costs two slots (one slot for the OC instance variable and one slot in the OC itself) so you'll save memory for the empty OC slots (probably like 100 bytes per element, totalling ~6 MB). For the attributes dictionary, reusing the RBSmallDictionary code should work well, or maybe lazy initalization would be okay coupled with moving a few attributes out of the dictionary: st> ((Dictionary allInstances select: [ :e | e size = 1 ]) collect: [ :e | e keys ]) asBag Bag(Set ('class' ):15498 Set (#created ):1 Set ('for' ):16613 Set (#selectorsMap ):1 ) st> ((Dictionary allInstances select: [ :e | e size = 2 ]) collect: [ :e | e keys ]) asBag Bag(Set (#expirySeconds #charset ):1 Set (#modified #created ):27 Set ('class' 'id' ):5166 ) st> ((Dictionary allInstances select: [ :e | e size = 3 ]) collect: [ :e | e keys ]) asBag Bag(Set ('value' 'name' 'type' ):5166 ) st> ((Dictionary allInstances select: [ :e | e size = 4 ]) collect: [ :e | e keys ]) asBag Bag(Set ('id' #value 'name' 'type' ):11447 ) st> ((Dictionary allInstances select: [ :e | e size = 5 ]) collect: [ :e | e keys ]) asBag Bag(... Set ('accept-charset' 'class' 'method' 'action' 'onsubmit' ):5166 ) Again, the break-even point is very low. Even storing 5 attributes in all HTML elements including <br> would be okay, since a Dictionary costs 4 words for flags+class+ size+object pointer, and 1 word for the tally. But of course value/name/type would be stored only in an InputElement etc., only id+class+lazily initialized children would be in the superclass. At this point, even using a special SmallDictionary class might be premature optimization... Since forms will be common in AJAX applications, saving memory there is nice. Maybe accept-charset could be left into attributes, but without the #ifAbsentPut: here: beforePrintHtml [ <category: 'printing'> self attributeAt: 'action' ifAbsentPut: [self url asString]. self attributeAt: 'method' ifAbsent: [self usePost]. self attributeAt: 'accept-charset' ifAbsentPut: [self session charset]. ... ] since 99.99% it will be sane to use the default taken from the session. Probably there are more such cases where #ifAbsentPut: is used. Strings are many (mostly from Action ids and from Swazoo's HTTP header parser), but they consume little memory: st> String allInstances inject: 0 into: [ :o :i | o + (i size + 7 bitAnd: -8) + 32 ] 347616 A final note regarding Swazoo, st> a := OrderedCollection allInstances select: [:e | e size = 3 and: [ e first isString ]] Shows some OrderedCollection ('application/json' ' text/javascript' ' */*') indeed, but only 43. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Stefan Schmiedl
On 07/20/2009 09:59 PM, Stefan Schmiedl wrote:
> On Mon, 20 Jul 2009 21:34:32 +0200 > Paolo Bonzini<[hidden email]> wrote: > >> st> (b := a allOwners) collect: [ :e | e class ] > > allOwners ... good thing I didn't start writing a reference finder :-) > >> st> b first attributes basicSize >> 32 > > is basicSize the RAM allocated for "the object itself", i.e. mostly the > pointers referencing other objects? No, it's the number of indexed instance variables ("slots" in the OrderedCollection or Dictionary, elements in a non-growable collection like Array or String). Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Stefan Schmiedl
On 07/20/2009 11:04 PM, nicolas petton wrote:
> Hi, > > I just committed ActionRegistry. > > It's still young code, but so far it works nicely. > An ActionRegistry is now responsible for holding all actions related to > its owner. This way widgets release all their obsolete actions each time > they are rebuilt. For each normal request, all registries are also > cleaned up. > > This way there shouldn't be obsolete actions which blocks keep > references to Iliad elements. I ran some tests with the blog and counter > examples, and the difference is impressive. I would still use a Dictionary there (the online tester has ~300 actions active at the same time, IIUC, out of ~700 created through the entire navigation). Nice though! Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
> Ok, but would you use a dictionary in the ActionRegistry, or would you > drop ActionRegistry too? > Right now actions are stored in OrderedCollection in registries. The former (a LookupTable actually). Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Paolo Bonzini-3
> >> IDEAS: I would probably replace the children > OrderedCollection with a tail+next circular linked list > representation. At the beginning tail := nil, if it is not nil > the head is accessible with tail next. Inserting at the tail > is easy. All can be encapsulated into #childrenDo: of course, > and it saves a slot compared to head+tail+next. Consider that > the OrderedCollection also costs two slots (one slot for the OC > instance variable and one slot in the OC itself) so you'll save > memory for the empty OC slots (probably like 100 bytes per > element, totalling ~6 MB). > > For the attributes dictionary, reusing the RBSmallDictionary > > What is the difference between LookupTable and RBSmallDIctionary? What > would be the more appropriate, and in which cases? RBSmallDictionary gives up hashing, and just does a linear search over an OrderedCollection. LookupTable is the same as a Dictionary, but it doesn't use Associations and this makes it use less memory. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
nicolas petton wrote:
> Using a LookupTable in ActionRegistry seems to slow down Iliad (a few ms > per request). More importantly, gst does a *lot* of garbage collections > (every 5 requests or so). This is very annoying, because it really slows > down the response time. > > Any hint? Can you send me the patch to ActionRegistry and the LookupTable version of the file, so I can try to track where the allocations come from? Also, does the same happen with Dictionary? Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
Paolo Bonzini wrote:
> nicolas petton wrote: >> Using a LookupTable in ActionRegistry seems to slow down Iliad (a few ms >> per request). More importantly, gst does a *lot* of garbage collections >> (every 5 requests or so). This is very annoying, because it really slows >> down the response time. >> >> Any hint? > > Can you send me the patch to ActionRegistry and the LookupTable version > of the file, so I can try to track where the allocations come from? Nevermind, it's in the svn repository. This however still holds: > Also, does the same happen with Dictionary? Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Paolo Bonzini-2
nicolas petton wrote:
> Using a LookupTable in ActionRegistry seems to slow down Iliad (a few ms > per request). More importantly, gst does a *lot* of garbage collections > (every 5 requests or so). This is very annoying, because it really slows > down the response time. I can only guess that this: 2003-06-25 Paolo Bonzini <[hidden email]> * kernel/LookupTable.st: Remove values instance variable, store key/value pairs in adjacent slots. was a bad idea. But, the garbage collection still puzzles me. Can you try adding a _gst_show_backtrace(stdout) in _gst_global_gc and see what is printed? Alternatively, I see you also have OrderedCollection in Session for the per-widget Action registries. Can you test the performance of Dictionary vs. LookupTable there, too? Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Stefan Schmiedl
Hi Paolo, Nico.
Another piece of the puzzle? On Thu, 16 Jul 2009 15:33:16 +0200 Stefan Schmiedl <[hidden email]> wrote: > > The students of class 6a logged into their Windows domain accounts, > started Firefox and entered the URL for the test (stage 1 above). > Then they entered their names into the registration page (stage 2) > and clicked on the button to access the test. Shortly after server > CPU load went to 100% with the following error message being repeated > as fast as the remote terminal could cope with: > > "Socket accept error: Error while trying to accept a socket connection" > > Client side a one-liner 500 error message was reported. > > Time for pkill gst-remote ... I rebuilt the image and started the > server again. This time we staged the 25 "almost simultaneous" login > attempts into four batches of 6 each and things worked fine from that point on. > > After finishing the test, the students logged off and the next class, 6b ... > had the exact same experience ... and 6c and 6d, too. > > For the final group I tried a different approach: > They logged on, opened the URL, and sat on their hands. > I killed gst-remote, rebuilt the image, restarted gst-remote and told them > to reload the page. They then entered their names and started clicking on > the answers and the Socket error of Doom appeared again. Kill, rebuild, > restart. Everybody loads the registration page (not staged, just 25 students > clicking when they're ready), enters their name and works on the test as it > should be. No hiccup. While I have not yet managed to reproduce the error message through a ruby mechanize script, I have noticed something suspicious: Start the server, check sockets on the server server # netstat -n | grep 4080 server # Run a mechanize script performing a few requests on the client. The script fetches the first page and the referenced css and js files. client $ ruby mech.rb 1 client $ Look at sockets on client client $ netstat -n | grep 4080 tcp 0 0 192.168.1.5:37021 88.198.5.34:4080 FIN_WAIT2 Look at sockets on server server # netstat -n | grep 4080 tcp 0 0 88.198.5.34:4080 93.223.36.238:37021 CLOSE_WAIT Wait about 10 min .... (typing this text) Look at sockets on client client $ netstat -n | grep 4080 client $ Look at sockets on server server # netstat -n | grep 4080 tcp 0 0 88.198.5.34:4080 93.223.36.238:37021 CLOSE_WAIT Run mechanize script again: client $ ruby mech.rb 1 client $ Sockets on client: client $ netstat -n | grep 4080 tcp 0 0 192.168.1.5:57747 88.198.5.34:4080 FIN_WAIT2 Sockets on server: server # netstat -n | grep 4080 tcp 0 0 88.198.5.34:4080 93.223.36.238:37021 CLOSE_WAIT tcp 0 0 88.198.5.34:4080 93.223.36.238:57747 CLOSE_WAIT soooo.... the problem described above has nothing to do with timing issues, but instead resource exhaustion due to _too many_ open sockets in CLOSE_WAIT state. Note also that the problem is heavily exacerbated when the app is accessed through an apache proxy as was done in the test session. In this scenario, running the same requests as above client $ ruby mech.rb 1 results in the following server-side mess: server # netstat -n | grep 4080 tcp 0 0 127.0.0.1:4080 127.0.0.1:57163 CLOSE_WAIT tcp 0 0 127.0.0.1:4080 127.0.0.1:57157 CLOSE_WAIT tcp 0 0 127.0.0.1:57155 127.0.0.1:4080 FIN_WAIT2 tcp 0 0 127.0.0.1:4080 127.0.0.1:57156 CLOSE_WAIT tcp 0 0 127.0.0.1:57163 127.0.0.1:4080 FIN_WAIT2 tcp 0 0 127.0.0.1:57157 127.0.0.1:4080 FIN_WAIT2 tcp 0 0 127.0.0.1:4080 127.0.0.1:57161 CLOSE_WAIT tcp 0 0 127.0.0.1:57153 127.0.0.1:4080 FIN_WAIT2 tcp 0 0 127.0.0.1:57161 127.0.0.1:4080 FIN_WAIT2 tcp 0 0 127.0.0.1:4080 127.0.0.1:57155 CLOSE_WAIT tcp 0 0 127.0.0.1:57162 127.0.0.1:4080 TIME_WAIT tcp 0 0 127.0.0.1:4080 127.0.0.1:57159 CLOSE_WAIT tcp 0 0 127.0.0.1:57154 127.0.0.1:4080 FIN_WAIT2 tcp 0 0 127.0.0.1:4080 127.0.0.1:57158 CLOSE_WAIT tcp 0 0 127.0.0.1:57156 127.0.0.1:4080 FIN_WAIT2 tcp 0 0 127.0.0.1:57160 127.0.0.1:4080 FIN_WAIT2 tcp 0 0 127.0.0.1:57158 127.0.0.1:4080 FIN_WAIT2 tcp 0 0 127.0.0.1:4080 127.0.0.1:57160 CLOSE_WAIT tcp 0 0 127.0.0.1:57159 127.0.0.1:4080 FIN_WAIT2 tcp 0 0 127.0.0.1:4080 127.0.0.1:57153 CLOSE_WAIT tcp 0 0 127.0.0.1:4080 127.0.0.1:57154 CLOSE_WAIT The FIN_WAIT2 sockets will disappear after a while, the *10* CLOSE_WAIT sockets won't. And since they are already closed, they won't be reused either, AFAICT. Now look at what google found for me: http://www.sunmanagers.org/pipermail/summaries/2006-January/007068.html I think, one of swazoo/sport/socket needs a behavioral readjustment. s. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
> While I have not yet managed to reproduce the error message through > a ruby mechanize script, I have noticed something suspicious: Very suspicious. Go slowly = time to do GCs = more finalization = closed sockets. Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Stefan Schmiedl
I can reproduce this with "strace -e accept,socket,close" on your image.
Lots of accepts, very few closes... Paolo _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk |
In reply to this post by Stefan Schmiedl
On Thu, 23 Jul 2009 00:48:10 +0200
Paolo Bonzini <[hidden email]> wrote: > Please send the script so I can try to reproduce this on your image. :-) attached s. _______________________________________________ help-smalltalk mailing list [hidden email] http://lists.gnu.org/mailman/listinfo/help-smalltalk mech.rb (1K) Download Attachment |
Free forum by Nabble | Edit this page |