Smalltalk › Gnu

the test test - an experience report

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

24 messages Options

Stefan Schmiedl

the test test - an experience report

You might recall my building a tool for online testing
with Iliad and GNU Smalltalk. Today was "the" day, and
here is what happened.

I had gst running on one of my webservers, students
accessed it through an apache proxy set up to serve
static files and forward the remaining requests to
the app. As testers we had 5 classes with about 25
students each. The test they took had 13 exercises
with a total of 95 questions.

The clients were about 25 PCs in a private subnet,
going through a NAT-ing DSL router with something
between 200 kbit/s and 500 kbit/s upstream and
2 Mbit/s and 6 Mbit/s downstream (IIRC), shared with a few
other users across the rest of the school grounds.
Response time was snappy, iliad's "ajax loader"
mostly only flashed in the corner, without a chance
of showing off the animation.

From the users' point of view, it was a nice change
of things, and they quite liked it, once we got over
the initial hurdle (below).

I watched the app through atop and paid attention to
the RSIZE, i.e. the amount of physical RAM the gst-remote
process was using
1) after startup, before any access: 19 MB
2) after displaying the registration page with a single widget
for 25 clients: 33 MB
3) after everybody started working, i.e. with 25 full blown
test widgets: 65 MB
4) after finishing the test: between 100 MB and 110 MB

I have saved the images of every run
gst-remote --eval="ObjectMemory snapshot"
and will try to find out what has caused this growth.

CPU load on the server varied upto 90% during the initial phase
(more below) and upto 30% during test execution. The numbers are
not really reliable, as the machine is hosting a few other apps,
which might have contributed. The machine is a "small" single
core 64 bit AMD64 3700+ with 1GB RAM.

Now for the interesting problem, that managed to make me
a bit nervous... and would have been a total showstopper,
had it not happened in this "experimental testing session".

The students of class 6a logged into their Windows domain accounts,
started Firefox and entered the URL for the test (stage 1 above).
Then they entered their names into the registration page (stage 2)
and clicked on the button to access the test. Shortly after server
CPU load went to 100% with the following error message being repeated
as fast as the remote terminal could cope with:

"Socket accept error: Error while trying to accept a socket connection"

Client side a one-liner 500 error message was reported.

Time for pkill gst-remote ... I rebuilt the image and started the
server again. This time we staged the 25 "almost simultaneous" login
attempts into four batches of 6 each and things worked fine from that point on.

After finishing the test, the students logged off and the next class, 6b ...
had the exact same experience ... and 6c and 6d, too.

For the final group I tried a different approach:
They logged on, opened the URL, and sat on their hands.
I killed gst-remote, rebuilt the image, restarted gst-remote and told them
to reload the page. They then entered their names and started clicking on
the answers and the Socket error of Doom appeared again. Kill, rebuild,
restart. Everybody loads the registration page (not staged, just 25 students
clicking when they're ready), enters their name and works on the test as it
should be. No hiccup.

I am very open to any suggestions as to what could have caused this
misbehavior. I don't think iliad is concerned (besides generating ajax requests),
so swazoo and the gst socket implementation are my next suspects.

My vague-feeling-in-the-gut-proved-by-handwaving hypothesis is that it
might be the combination of building a fairly large widget tree *and*
creating a bunch of new socket connections at the same time that's causing
the trouble.

I'll try to build a test bed to reproduce the disaster in a controlled
setting, but it will be a few days before I can really get to this.

Any ideas?

s.

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-3

Re: the test test - an experience report

On 07/16/2009 03:33 PM, Stefan Schmiedl wrote:
> You might recall my building a tool for online testing
> with Iliad and GNU Smalltalk. Today was "the" day, and
> here is what happened.

Thanks for the experience!

> From the users' point of view, it was a nice change
> of things, and they quite liked it, once we got over
> the initial hurdle (below).

What were they using before?

> I have saved the images of every run
> gst-remote --eval="ObjectMemory snapshot"
> and will try to find out what has caused this growth.

What is the actual size of the images?

> The students of class 6a logged into their Windows domain accounts,
> started Firefox and entered the URL for the test (stage 1 above).
> Then they entered their names into the registration page (stage 2)
> and clicked on the button to access the test. Shortly after server
> CPU load went to 100% with the following error message being repeated
> as fast as the remote terminal could cope with:
>
> "Socket accept error: Error while trying to accept a socket connection"

Do you have the backtrace?

> I'll try to build a test bed to reproduce the disaster in a controlled
> setting, but it will be a few days before I can really get to this.
>
> Any ideas?

You could use Ruby's Mechanize for that.
http://mechanize.rubyforge.org/mechanize/ -- just to show we're not
zealots! :-)

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Stefan Schmiedl

Re: the test test - an experience report

On Fri, 17 Jul 2009 13:12:45 +0200
Paolo Bonzini <[hidden email]> wrote:

> Thanks for the experience!

I left out the horrific part where the usual last minute fix from the
night before broke half of the system ...

>
> > From the users' point of view, it was a nice change
> > of things, and they quite liked it, once we got over
> > the initial hurdle (below).
>
> What were they using before?

usual school equipment: paper and pencils :-)

> > I have saved the images of every run
> > gst-remote --eval="ObjectMemory snapshot"
> > and will try to find out what has caused this growth.
>
> What is the actual size of the images?

stefan@g128 testdrive $ ls -l ../ot.im *.im
-rw-r--r-- 1 stefan stefan 6234160 16. Jul 17:14 ../ot.im <- "clean image"
-r--r--r-- 1 stefan stefan 76702080 16. Jul 09:30 ot6a.im <- "used image"
-r--r--r-- 1 stefan stefan 92434208 16. Jul 10:35 ot6b.im <- "special case"
-r--r--r-- 1 stefan stefan 74360512 16. Jul 11:24 ot6c.im <- "used image"
-r--r--r-- 1 stefan stefan 79733448 16. Jul 11:59 ot6d.im <- "used image"
-r--r--r-- 1 stefan stefan 79306448 16. Jul 12:33 ot6e.im <- "used image"

The "special case" was a group of students who were sitting there somewhat
longer, probably clicking around some more.

> > The students of class 6a logged into their Windows domain accounts,
> > started Firefox and entered the URL for the test (stage 1 above).
> > Then they entered their names into the registration page (stage 2)
> > and clicked on the button to access the test. Shortly after server
> > CPU load went to 100% with the following error message being
> > repeated as fast as the remote terminal could cope with:
> >
> > "Socket accept error: Error while trying to accept a socket
> > connection"
>
> Do you have the backtrace?

There's not much in there. I only have two logs, one contains:

Loading package Sockets
Loading package Sport
Loading package Iconv
Loading package Iliad-Core
Loading package Iliad-More-UI
Loading package Magritte
Loading package Iliad-More-Widgets
Loading package Iliad-More-Magritte
Loading package Swazoo
Loading package Iliad-More-Examples
Loading package Iliad-More
Loading package Iliad-Swazoo
Loading package Iliad
Loading package OnlineTester
gst-remote server started.
Socket accept error: Error while trying to accept a socket connection.
... repeated 34461 times until pkill gst-remote ...
/usr/local/share/smalltalk/kernel/../scripts/Remote.st:281: Terminated
(ip 12)HomedAssociation(Association)>>#=
(ip 38)Set>>#findIndex:
(ip 6)Set(HashedCollection)>>#findIndexOrNil:
(ip 4)Set(HashedCollection)>>#remove:ifAbsent:
(ip 8)Set(Collection)>>#mourn:
(ip 6)HomedAssociation>>#mourn
(ip 4)[] in ProcessorScheduler>>#initialize
(ip 24)Array(SequenceableCollection)>>#do:
(ip 6)[] in ProcessorScheduler>>#initialize
(ip 4)[] in BlockClosure>>#newProcessWith:
(ip 52)[] in Process>>#onBlock:at:suspend:
(ip 10)<unwind> BlockClosure>>#on:do:
(ip 14)[] in Process>>#onBlock:at:suspend:
(ip 4)<unwind> BlockClosure>>#ensure:
(ip 10)[] in Process>>#onBlock:at:suspend:
(ip 42)[] in BlockClosure>>#asContext:
(ip 12)BlockContext class>>#fromClosure:parent:

and the other ends with the "Remote.st:281: Terminated" line after 40417
"Socket accept error" lines. In that case, I'm fairly sure that the time between
the first error and the pkill was below 1 min.

> > I'll try to build a test bed to reproduce the disaster in a
> > controlled setting, but it will be a few days before I can really
> > get to this.
> >
> > Any ideas?
>
> You could use Ruby's Mechanize for that.
> http://mechanize.rubyforge.org/mechanize/ -- just to show we're not
> zealots! :-)

huh ... ruby, gst, what's the difference?

ducking and covering,
s.

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Stefan Schmiedl

Re: the test test - a more detailed analysis of image contents

I really should make a blog post out of this ... heck it's enough
for another mini-series :-)

Executive Summary:

There are MANY objects in the image related to Iliad.
How do we get rid of them?

On Fri, 17 Jul 2009 15:09:21 +0200
Stefan Schmiedl <[hidden email]> wrote:

> > What is the actual size of the images?
>
> stefan@g128 testdrive $ ls -l ../ot.im *.im
> -rw-r--r-- 1 stefan stefan 6234160 16. Jul 17:14 ../ot.im <- "clean image"
> -r--r--r-- 1 stefan stefan 76702080 16. Jul 09:30 ot6a.im <- "used image"
> -r--r--r-- 1 stefan stefan 92434208 16. Jul 10:35 ot6b.im <- "special case"
> -r--r--r-- 1 stefan stefan 74360512 16. Jul 11:24 ot6c.im <- "used image"
> -r--r--r-- 1 stefan stefan 79733448 16. Jul 11:59 ot6d.im <- "used image"
> -r--r--r-- 1 stefan stefan 79306448 16. Jul 12:33 ot6e.im <- "used image"

ot.im is the preloaded image, my OnlineTester package, depending
on Iliad. The other images were saved via
gst-remote --eval="ObjectMemory snapshot"
after finishing the test session.

I ran a bit of code on each of these images:

Object subclassesDo: [ :c | |s|
s := c allInstances size.
s > 0 ifTrue: [
c name print.
': ' display.
s printNl
]
]

Then I realized, that I was missing all the interesting classes,
but manually comparing allSubclassesDo output is really too much...
so let's gain some hacking points.

Collect the "clean" state in ot.im:
usage := IdentityDictionary new.
Object allSubclassesDo: [ :cls | |count|
count := cls allInstances size.
usage at: cls put: count
].
s := FileStream fopen: 'counts.dump' mode: #write.
[ ObjectDumper dump: usage to: s ] ensure: [ s close ]

The dictionary usage contains 1237 items and we can now
compare it to the "used" state in the other images:
s := FileStream fopen: 'counts.dump' mode: #read.
[ usage := ObjectDumper loadFrom: s ] ensure: [ s close ]
Object allSubclassesDo: [ :cls | |count orig|
count := cls allInstances size.
orig := usage at: cls ifAbsent: [ 0 ].
( count > 1 and: [count > (orig * 2)] ) ifTrue: [
cls display.
': ' display.
orig print.
' -> ' display.
count printNl.
]
].

ot6a.im has run with an error in the application code, so
the data might be a little bit off, let's look at some
impressive changes in ot6b.im. I've grouped them into some
chunks of related (in my point of view) classes.

"My" object counts match expections:
OnlineTester.OTApplication: 0 -> 27
OnlineTester.OTAufgabe: 0 -> 13
OnlineTester.OTFrage: 0 -> 95
OnlineTester.OTRegisterWidget: 0 -> 27
OnlineTester.OTTestWidget: 0 -> 27
OnlineTester.OTAufgabeWidget: 0 -> 351 = 13 * 27
OnlineTester.OTFrageWidget: 0 -> 2565 = 95 * 27 (I guess)

Swazoo and Socket objects don't seem to be out of range, either,
considering the Process count.
Process: 5 -> 258
Swazoo.ContentTypeField: 0 -> 246
Swazoo.GenericHeaderField: 0 -> 2704
Swazoo.HTTPAcceptField: 0 -> 246
Swazoo.HTTPConnection: 0 -> 246
Swazoo.HTTPConnectionField: 0 -> 492
Swazoo.HTTPContentLengthField: 0 -> 246
Swazoo.HTTPDateField: 0 -> 246
Swazoo.HTTPGet: 0 -> 246
Swazoo.HTTPHeaders: 0 -> 492
Swazoo.HTTPHostField: 0 -> 246
Swazoo.HTTPRefererField: 0 -> 201
Swazoo.HTTPRequestLine: 0 -> 246
Swazoo.HTTPResponse: 0 -> 246
Swazoo.HTTPServerField: 0 -> 246
Swazoo.HTTPSetCookieField: 0 -> 206
Swazoo.HTTPUserAgentField: 0 -> 246
Swazoo.SwazooBuffer: 0 -> 246
Swazoo.SwazooSocket: 0 -> 248
Swazoo.SwazooStream: 0 -> 246
Swazoo.SwazooTask: 0 -> 246
Swazoo.SwazooURI: 0 -> 447
Swazoo.URIResolution: 0 -> 246
Sockets.IPAddress: 0 -> 500
Sockets.ReadBuffer: 0 -> 247
Sockets.ServerSocket: 0 -> 3
Sockets.StreamSocket: 0 -> 246
Sockets.TCPSocketImpl: 0 -> 251
SpIPAddress: 0 -> 2
SpSocket: 0 -> 248
SpTimestamp: 0 -> 492

Iliad object counts seem to be a bit high, except for Session,
and they stay that way, even after a garbage collection.
Iliad.Session: 0 -> 28
Iliad.Action: 0 -> 16618
Iliad.BreakElement: 0 -> 16613
Iliad.Decorator: 0 -> 2999
Iliad.DivElement: 0 -> 15498
Iliad.FormElement: 0 -> 5166
Iliad.Id: 0 -> 3054
Iliad.InputElement: 0 -> 5166
Iliad.LabelElement: 0 -> 16613
Iliad.RadioButtonElement: 0 -> 16613
Iliad.RawHtmlElement: 0 -> 26945
Iliad.SpanElement: 0 -> 5166

I guess the following counts are closely related to
the Iliad objects:
BlockClosure: 1254 -> 19541
BlockContext: 304 -> 23255
String: 14635 -> 95456

The FileStream count shows that my apache proxy did not work
as expected. Most of those are pointing to static files I had
tried to keep out of swazoo's hair.
FileStream: 3 -> 33

Some things have gone wrong during usage:
Kernel.CoreException: 7 -> 19

Let's look at them:
Kernel.CoreException allInstancesDo: [ :e |
e signalClass printNl
]
Error
Exception
Halt
Iliad.CurrentContext
Iliad.DispatchError
Iliad.ResponseNotification
Kernel.CoreException
MessageNotUnderstood
Notification
Signal
SpAbstractError
SpError
Swazoo.HTTPException
Swazoo.SwazooStreamNoDataError
SystemExceptions.FileError
SystemExceptions.PrimitiveFailed
SystemExceptions.ProcessBeingTerminated
SystemExceptions.UnhandledException
SystemExceptions.UserInterrupt
SystemExceptions.VMError

The most shocking increases are found in the collection
hierarchy:

IdentityDictionary: 17 -> 48
let's count the sizes
Bag(0:1 1:30 251:1 3:2 2:1 6:1 14:1 13:1 17:1 23:1 22:1 )
What do I need 30 IdentityDictionaries with 1 item for?
OnlineTester.OTApplication->an OTApplication
ah ... must come from iliad sessions

WeakArray: 3 -> 993
hmm... they're gone now ... weaklings, afraid of being
examined.

HomedAssociation: 3 -> 548
note to self: look up what that is.
map temporary stuff to nil

Association: 524 -> 178658
Dictionary: 10 -> 108373
the former are stored in the latter and are almost
completely iliad stuff

OrderedCollection: 60 -> 109188
let's count sizes again
Bag(0:70960 32:3 1:27178 97:1 3:5211 2:160 64:1
7:85 4:57 6:57 5:3 288:1 135:1 14:1117 8:199
13:30 9:28 11:4052 10:31 43:2 12:3 576:1 28:1
17:1 92:1 18:1 3246:1 20:1 2304:1 1152:1 )
ooh ... 70k empty OC's
the 27k single element OC's have either a RawHtmlElement
or a cookie
the 5k 3-element OC's are either the mime-type collection
('application/json' ' text/javascript' ' */*' )
or
(a DivElement a FormElement a DivElement )
NB the singel OC with 3246 elements contains either
classes or (quite often) nil ... what's that?

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-2

Re: the test test - a more detailed analysis of image contents

> ot.im is the preloaded image, my OnlineTester package, depending
> on Iliad. The other images were saved via
> gst-remote --eval="ObjectMemory snapshot"
> after finishing the test session.

Can you place them somewhere I can download them? They are interesting.

> [ ObjectDumper dump: usage to: s ] ensure: [ s close ]

Nice :-)

> Swazoo and Socket objects don't seem to be out of range, either,
> considering the Process count.
> Process: 5 -> 258

A bit too many? Maybe something is holding onto them.

> Iliad object counts seem to be a bit high, except for Session,
> and they stay that way, even after a garbage collection.

Then we have to see what's the root for them. Maybe some Process?

> Iliad.CurrentContext
> Iliad.ResponseNotification
> Notification
> SystemExceptions.ProcessBeingTerminated

These are notifications and are not a problem.

> HomedAssociation: 3 -> 548
> note to self: look up what that is.
> map temporary stuff to nil

Related to weak collections and finalizations. Probably sockets.

> Association: 524 -> 178658
> Dictionary: 10 -> 108373
> the former are stored in the latter and are almost
> completely iliad stuff

100000 dictionaries are a bit too many. And anyway changing them to
LookupTables would save 178658 * 12 bytes for the associations.

> OrderedCollection: 60 -> 109188
> let's count sizes again
> Bag(0:70960 32:3 1:27178 97:1 3:5211 2:160 64:1
> 7:85 4:57 6:57 5:3 288:1 135:1 14:1117 8:199
> 13:30 9:28 11:4052 10:31 43:2 12:3 576:1 28:1
> 17:1 92:1 18:1 3246:1 20:1 2304:1 1152:1 )
> ooh ... 70k empty OC's

70000 * approx 40 bytes here to save too if you can use lazy
initialization...

> the 5k 3-element OC's are either the mime-type collection
> ('application/json' ' text/javascript' ' */*' )

Probably some result of parsing Accept fields? For many
OrderedCollections, adding asArray strategically would save memory at
the cost of creating more garbage.

Summary: there's definitely something to optimize in code that creates
too many collections.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-2

Re: the test test - a more detailed analysis of image contents

> Yes, definitely. I'm a bit afraid by all these actions, because I know
> that many of them must be obsolete. I think it's time to introduce a new
> class, something like ActionRegistry, to manage actions a bit more
> finely, per widget probably.

That would be nice. Still, congratulations (to both Nico and Stefan,
for different reasons) for the usage of Iliad in production. Iliad
showed it was way more than a proof of concept, now it's time to
optimize it!

(Still, I'm curious about all those processes. I'll look at them soon).

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-3

Re: the test test - a more detailed analysis of image contents

In reply to this post by Stefan Schmiedl

A couple more data points to give Nicolas some work.

Let's get a random OC:

st> a := (OrderedCollection allInstances at: 14000)
OrderedCollection ()

and let's see where it comes from. Some objects fail to print because
they point to files on Stefan's machine, so..

st> (b := a allOwners) collect: [ :e | e class ]
(Iliad.RawHtmlElement MethodContext CallinProcess Array Array Array
Array Array Array MethodContext MethodContext Association MethodContext
CallinProcess BlockClosure MethodContext MethodContext MethodContext )

st> b first inspect
An instance of Iliad.RawHtmlElement
children: OrderedCollection ()
attributes: Dictionary (
)
contents: 'falsch'
a RawHtmlElement

Doh. Here go ~400 bytes:

st> b first attributes basicSize
32
st> b first children basicSize
16

13 megabytes:

st> Iliad.RawHtmlElement allInstances count: [ :e | e children size = 0
& e attributes size = 0 ]
26945

plus 6 more for Iliad.BreakElement:

st> Iliad.Element allSubclasses asArray collect: [ :e | e->(e
allInstances count: [ :e | e children size = 0 & e attributes size = 0 ]) ]
.... Iliad.BreakElement->16613 ....

Regarding children, there are 5166 InputElements and 16613
RadioButtonElement that would save 150 bytes each.

and so on.

I'll look at Accept later.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Stefan Schmiedl

Re: the test test - a more detailed analysis of image contents

On Mon, 20 Jul 2009 21:34:32 +0200
Paolo Bonzini <[hidden email]> wrote:

> st> (b := a allOwners) collect: [ :e | e class ]

allOwners ... good thing I didn't start writing a reference finder :-)

> st> b first attributes basicSize
> 32

is basicSize the RAM allocated for "the object itself", i.e. mostly the
pointers referencing other objects?

those are two very valuable tools ... thanks.

> st> b first children basicSize
> 16
>
> 13 megabytes:
>
> st> Iliad.RawHtmlElement allInstances count: [ :e | e children size = 0
> & e attributes size = 0 ]
> 26945
>
> plus 6 more for Iliad.BreakElement:

heh ... 6MB for <br/>? that's about all the breaks I need in a lifetime :-)

> I'll look at Accept later.

This is great. If Nico's reorganization action works out and if we can
further reduce the amount of RAM per element, Iliad will run really well
even with scarce resources.

I'll be tied up with other work until Wednesday, but hope to look at this
again then.

Thanks for all your effort,
s.

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-3

Re: the test test - a more detailed analysis of image contents

In reply to this post by Stefan Schmiedl

Oops, hit send too early.

Let's see where they come from.

st> (c := b first allOwners) collect: [ :e | e class ]
(OrderedCollection Array Array Array Array Array Array Array Array Array
Array Array Array Array Array Array Array Array Array Array Array Array
Array Array Array Array Array Array Array Array Array Array Array Array
Array Array Array Array Array )
st> (d := c first allOwners) collect: [ :e | e class ]
(Iliad.LabelElement Array Array Array Array Array Array Array )
st> (e := d first allOwners first allOwners) collect: [ :e | e class ]
(Iliad.FormElement Array Array Array Array Array Array Array )
st> (f := e first allOwners first allOwners) collect: [ :e | e class ]
(Iliad.DivElement Array Array Array Array Array Array Array )
st> (g := f first allOwners first allOwners) collect: [ :e | e class ]
(BlockContext BlockContext BlockContext BlockContext Array Array )

st> g first method
[] in OnlineTester.OTFrageWidget>>contents
st> g second method
[] in OnlineTester.OTFrageWidget>>contents
st> g third method
[] in OnlineTester.OTFrageWidget>>contents
st> (h := g first allOwners) collect: [ :e | e class ]
(BlockClosure Array )
st> (i := h first allOwners) collect: [ :e | e class ]
(Iliad.Action Array Array )
st> (j := i first allOwners) collect: [ :e | e class ]
(MethodContext CallinProcess MethodContext MethodContext MethodContext
Association BlockClosure BlockClosure BlockClosure BlockClosure
BlockClosure BlockClosure MethodContext CallinProcess MethodContext Array )

Uh-oh. Quite a few owners, in particular:

st> (j at: 6)
'100058'->an Action
st> (k := (j at: 6) allOwners) collect: [ :e | e class ]
...
st> (k at: 8) class
Dictionary
st> (k at: 8) size
681

681 actions means that using a Dictionary for them was not bad (just a
LookupTable would save memory). Sorry Nicolas for your ActionRegistry. :-)

At least, the number of widgets is what you'd expect:

st> actions := (k at: 8) asArray
st> widgets := (IdentitySet withAll: (actions collect: [:e |e block
receiver]) asArray
st> widgets size
95
st> widgets first frage aufgabe test aufgaben inject: 0 into: [ :o :a |
o + a fragen size ]
95

... and tells us that the dictionary covered the Actions in a single
session as expected (there are 95*27 OTFrageWidgets, 95 per pupil). The
number of actions overall is also correct, so there is no leak here: it
should be roughly 2*number of answers per quiz (the AJAX request
rebuilds the tree and creates new actions), and we get

st> widgets first frage aufgabe test aufgaben inject: 0 into: [ :o :a |
o + (a fragen size * a antworten size * 2) ]
610

Probably sometimes (s)he tried different answers: :-)

st> (widgets collect: [:w| actions count: [:e | e block receiver == w]])
asBag
Bag(6:61 4:1 12:7 8:14 16:1 9:11 )

Again, this is for one pupil only. Indeed we have overall over 16000
actions (as pointed out by Stefan in his first analysis), and these had
better stay in the Dictionary.

Instead for the

st> ((Iliad.Element allSubclasses collect: [:e|e allInstances size])
st> fold: [:a :b|a+b])
64222

elements given the back-of-the-envelope analysis of OrderedCollection
and Dictionary usage, we are spending at least ~32 megabytes out of 90
(probably more like 40-50).

>> IDEAS: I would probably replace the children
OrderedCollection with a tail+next circular linked list
representation. At the beginning tail := nil, if it is not nil
the head is accessible with tail next. Inserting at the tail
is easy. All can be encapsulated into #childrenDo: of course,
and it saves a slot compared to head+tail+next. Consider that
the OrderedCollection also costs two slots (one slot for the OC
instance variable and one slot in the OC itself) so you'll save
memory for the empty OC slots (probably like 100 bytes per
element, totalling ~6 MB).

For the attributes dictionary, reusing the RBSmallDictionary
code should work well, or maybe lazy initalization would be
okay coupled with moving a few attributes out of the dictionary:

st> ((Dictionary allInstances select: [ :e | e size = 1 ])
collect: [ :e | e keys ]) asBag
Bag(Set ('class' ):15498 Set (#created ):1 Set ('for' ):16613
Set (#selectorsMap ):1 )

st> ((Dictionary allInstances select: [ :e | e size = 2 ])
collect: [ :e | e keys ]) asBag
Bag(Set (#expirySeconds #charset ):1 Set (#modified #created
):27 Set ('class' 'id' ):5166 )

st> ((Dictionary allInstances select: [ :e | e size = 3 ])
collect: [ :e | e keys ]) asBag
Bag(Set ('value' 'name' 'type' ):5166 )

st> ((Dictionary allInstances select: [ :e | e size = 4 ])
collect: [ :e | e keys ]) asBag
Bag(Set ('id' #value 'name' 'type' ):11447 )

st> ((Dictionary allInstances select: [ :e | e size = 5 ])
collect: [ :e | e keys ]) asBag
Bag(... Set ('accept-charset' 'class' 'method' 'action'
'onsubmit' ):5166 )

Again, the break-even point is very low. Even storing
5 attributes in all HTML elements including <br> would be
okay, since a Dictionary costs 4 words for flags+class+
size+object pointer, and 1 word for the tally. But of course
value/name/type would be stored only in an InputElement etc.,
only id+class+lazily initialized children would be in the
superclass. At this point, even using a special SmallDictionary
class might be premature optimization...

Since forms will be common in AJAX applications, saving
memory there is nice. Maybe accept-charset could be left into
attributes, but without the #ifAbsentPut: here:

beforePrintHtml [
<category: 'printing'>
self attributeAt: 'action' ifAbsentPut: [self url asString].
self attributeAt: 'method' ifAbsent: [self usePost].
self attributeAt: 'accept-charset'
ifAbsentPut: [self session charset].
...
]

since 99.99% it will be sane to use the default taken from the
session. Probably there are more such cases where #ifAbsentPut:
is used.

Strings are many (mostly from Action ids and from Swazoo's HTTP header
parser), but they consume little memory:

st> String allInstances inject: 0 into: [ :o :i | o + (i size + 7
bitAnd: -8) + 32 ]
347616

A final note regarding Swazoo,

st> a := OrderedCollection allInstances select: [:e | e size = 3 and: [
e first isString ]]

Shows some OrderedCollection ('application/json' ' text/javascript' '
*/*') indeed, but only 43.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-3

Re: the test test - a more detailed analysis of image contents

In reply to this post by Stefan Schmiedl

On 07/20/2009 09:59 PM, Stefan Schmiedl wrote:

> On Mon, 20 Jul 2009 21:34:32 +0200
> Paolo Bonzini<[hidden email]> wrote:
>
>> st> (b := a allOwners) collect: [ :e | e class ]
>
> allOwners ... good thing I didn't start writing a reference finder :-)
>
>> st> b first attributes basicSize
>> 32
>
> is basicSize the RAM allocated for "the object itself", i.e. mostly the
> pointers referencing other objects?

No, it's the number of indexed instance variables ("slots" in the
OrderedCollection or Dictionary, elements in a non-growable collection
like Array or String).

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-3

Re: the test test - a more detailed analysis of image contents

In reply to this post by Stefan Schmiedl

On 07/20/2009 11:04 PM, nicolas petton wrote:

> Hi,
>
> I just committed ActionRegistry.
>
> It's still young code, but so far it works nicely.
> An ActionRegistry is now responsible for holding all actions related to
> its owner. This way widgets release all their obsolete actions each time
> they are rebuilt. For each normal request, all registries are also
> cleaned up.
>
> This way there shouldn't be obsolete actions which blocks keep
> references to Iliad elements. I ran some tests with the blog and counter
> examples, and the difference is impressive.

I would still use a Dictionary there (the online tester has ~300 actions
active at the same time, IIUC, out of ~700 created through the entire
navigation).

Nice though!

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-3

Re: the test test - a more detailed analysis of image contents

> Ok, but would you use a dictionary in the ActionRegistry, or would you
> drop ActionRegistry too?
> Right now actions are stored in OrderedCollection in registries.

The former (a LookupTable actually).

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-2

Re: the test test - a more detailed analysis of image contents

In reply to this post by Paolo Bonzini-3

> >> IDEAS: I would probably replace the children
> OrderedCollection with a tail+next circular linked list
> representation. At the beginning tail := nil, if it is not nil
> the head is accessible with tail next. Inserting at the tail
> is easy. All can be encapsulated into #childrenDo: of course,
> and it saves a slot compared to head+tail+next. Consider that
> the OrderedCollection also costs two slots (one slot for the OC
> instance variable and one slot in the OC itself) so you'll save
> memory for the empty OC slots (probably like 100 bytes per
> element, totalling ~6 MB).
>
> For the attributes dictionary, reusing the RBSmallDictionary
>
> What is the difference between LookupTable and RBSmallDIctionary? What
> would be the more appropriate, and in which cases?

RBSmallDictionary gives up hashing, and just does a linear search over
an OrderedCollection.

LookupTable is the same as a Dictionary, but it doesn't use Associations
and this makes it use less memory.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-2

Re: the test test - a more detailed analysis of image contents

Paolo Bonzini-2

Re: the test test - a more detailed analysis of image contents

Paolo Bonzini wrote:

> nicolas petton wrote:
>> Using a LookupTable in ActionRegistry seems to slow down Iliad (a few ms
>> per request). More importantly, gst does a *lot* of garbage collections
>> (every 5 requests or so). This is very annoying, because it really slows
>> down the response time.
>>
>> Any hint?
>
> Can you send me the patch to ActionRegistry and the LookupTable version
> of the file, so I can try to track where the allocations come from?

Nevermind, it's in the svn repository. This however still holds:

> Also, does the same happen with Dictionary?

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-2

Re: the test test - a more detailed analysis of image contents

In reply to this post by Paolo Bonzini-2

nicolas petton wrote:
> Using a LookupTable in ActionRegistry seems to slow down Iliad (a few ms
> per request). More importantly, gst does a *lot* of garbage collections
> (every 5 requests or so). This is very annoying, because it really slows
> down the response time.

I can only guess that this:

2003-06-25 Paolo Bonzini <[hidden email]>

* kernel/LookupTable.st: Remove values instance variable,
store key/value pairs in adjacent slots.

was a bad idea.

But, the garbage collection still puzzles me.

Can you try adding a _gst_show_backtrace(stdout) in _gst_global_gc and
see what is printed?

Alternatively, I see you also have OrderedCollection in Session for the
per-widget Action registries. Can you test the performance of
Dictionary vs. LookupTable there, too?

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Stefan Schmiedl

Re: the test test - an experience report

In reply to this post by Stefan Schmiedl

Hi Paolo, Nico.

Another piece of the puzzle?

On Thu, 16 Jul 2009 15:33:16 +0200
Stefan Schmiedl <[hidden email]> wrote:

>
> The students of class 6a logged into their Windows domain accounts,
> started Firefox and entered the URL for the test (stage 1 above).
> Then they entered their names into the registration page (stage 2)
> and clicked on the button to access the test. Shortly after server
> CPU load went to 100% with the following error message being repeated
> as fast as the remote terminal could cope with:
>
> "Socket accept error: Error while trying to accept a socket connection"
>
> Client side a one-liner 500 error message was reported.
>
> Time for pkill gst-remote ... I rebuilt the image and started the
> server again. This time we staged the 25 "almost simultaneous" login
> attempts into four batches of 6 each and things worked fine from that point on.
>
> After finishing the test, the students logged off and the next class, 6b ...
> had the exact same experience ... and 6c and 6d, too.
>
> For the final group I tried a different approach:
> They logged on, opened the URL, and sat on their hands.
> I killed gst-remote, rebuilt the image, restarted gst-remote and told them
> to reload the page. They then entered their names and started clicking on
> the answers and the Socket error of Doom appeared again. Kill, rebuild,
> restart. Everybody loads the registration page (not staged, just 25 students
> clicking when they're ready), enters their name and works on the test as it
> should be. No hiccup.

While I have not yet managed to reproduce the error message through
a ruby mechanize script, I have noticed something suspicious:

Start the server, check sockets on the server
server # netstat -n | grep 4080
server #

Run a mechanize script performing a few requests on the client.
The script fetches the first page and the referenced css and js files.
client $ ruby mech.rb 1
client $

Look at sockets on client
client $ netstat -n | grep 4080
tcp 0 0 192.168.1.5:37021 88.198.5.34:4080 FIN_WAIT2

Look at sockets on server
server # netstat -n | grep 4080
tcp 0 0 88.198.5.34:4080 93.223.36.238:37021 CLOSE_WAIT

Wait about 10 min .... (typing this text)

Look at sockets on client
client $ netstat -n | grep 4080
client $

Look at sockets on server
server # netstat -n | grep 4080
tcp 0 0 88.198.5.34:4080 93.223.36.238:37021 CLOSE_WAIT

Run mechanize script again:
client $ ruby mech.rb 1
client $

Sockets on client:
client $ netstat -n | grep 4080
tcp 0 0 192.168.1.5:57747 88.198.5.34:4080 FIN_WAIT2

Sockets on server:
server # netstat -n | grep 4080
tcp 0 0 88.198.5.34:4080 93.223.36.238:37021 CLOSE_WAIT
tcp 0 0 88.198.5.34:4080 93.223.36.238:57747 CLOSE_WAIT

soooo.... the problem described above has nothing to do with timing issues,
but instead resource exhaustion due to _too many_ open sockets in CLOSE_WAIT
state.

Note also that the problem is heavily exacerbated when the app is accessed
through an apache proxy as was done in the test session. In this scenario,
running the same requests as above
client $ ruby mech.rb 1

results in the following server-side mess:
server # netstat -n | grep 4080
tcp 0 0 127.0.0.1:4080 127.0.0.1:57163 CLOSE_WAIT
tcp 0 0 127.0.0.1:4080 127.0.0.1:57157 CLOSE_WAIT
tcp 0 0 127.0.0.1:57155 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57156 CLOSE_WAIT
tcp 0 0 127.0.0.1:57163 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:57157 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57161 CLOSE_WAIT
tcp 0 0 127.0.0.1:57153 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:57161 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57155 CLOSE_WAIT
tcp 0 0 127.0.0.1:57162 127.0.0.1:4080 TIME_WAIT
tcp 0 0 127.0.0.1:4080 127.0.0.1:57159 CLOSE_WAIT
tcp 0 0 127.0.0.1:57154 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57158 CLOSE_WAIT
tcp 0 0 127.0.0.1:57156 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:57160 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:57158 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57160 CLOSE_WAIT
tcp 0 0 127.0.0.1:57159 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57153 CLOSE_WAIT
tcp 0 0 127.0.0.1:4080 127.0.0.1:57154 CLOSE_WAIT

The FIN_WAIT2 sockets will disappear after a while, the *10* CLOSE_WAIT sockets won't.
And since they are already closed, they won't be reused either, AFAICT.

Now look at what google found for me:

http://www.sunmanagers.org/pipermail/summaries/2006-January/007068.html

I think, one of swazoo/sport/socket needs a behavioral readjustment.

s.

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-3

Re: the test test - an experience report

> While I have not yet managed to reproduce the error message through
> a ruby mechanize script, I have noticed something suspicious:

Very suspicious.

Go slowly = time to do GCs = more finalization = closed sockets.

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Paolo Bonzini-3

Re: the test test - an experience report

In reply to this post by Stefan Schmiedl

I can reproduce this with "strace -e accept,socket,close" on your image.
Lots of accepts, very few closes...

Paolo

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Stefan Schmiedl

Re: the test test - an experience report

In reply to this post by Stefan Schmiedl

On Thu, 23 Jul 2009 00:48:10 +0200
Paolo Bonzini <[hidden email]> wrote:

> Please send the script so I can try to reproduce this on your image. :-)

attached
s.

_______________________________________________
help-smalltalk mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

mech.rb (1K) Download Attachment