Hi,
I've been running a Squeak web server on the wide-open internet for over a year now (WebClient/WebServer/Seaside/nginx proxy). Within a few days it will always stop answering the phone, begin eating ~99% of the CPU, and eventually segfault (though it has become less segfaulty on this May 2020 VM). I've tried the recommended Unix tweaks for it (file handle limits, etc) but I believe there is something going on inside the image. Today's issue may be related (at least) to some code in WebMessage. tl;dr: I'm proposing a change to WebMessage>>#streamDirectlyFrom:to:size:progress: ...but my proposed change could very well be wrong, or be poor logic. So I'm writing up my lengthy rationale below. What do you think about changing: [(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < size]] whileTrue:[ to: [sizeOrNil == nil or: [totalRead < size and: [srcStream atEnd not]]] whileTrue:[ ? My logic could be wrong, and I don't understand the sizeOrNil check here. But after I made this change, the infinite loop went away, GC destroyed hundreds of stale Sockets and Processes, and my Seaside app began answering again (note that this is the first time I've been successful at bringing it back to life). If the original logic is correct as intended, it seems to be a trade-off between continuing to read from the stream if we haven't yet read #size characters, versus not trying to read from a stream that is #atEnd. My version breaks out of the loop if the stream is #atEnd even if the method believes there are characters left to read. The reason I think this is worthwhile, is that #totalRead will (endlessly) be 0 if the Socket underneath the SocketStream is closed, and there are no more bytes to read, thus causing an infinite loop. It is *possible* (and mentioned below) that the design is supposed to catch a #ConnectionClosed exception, but in my case one never got thrown. (note the Socket's state was #otherEndClosedButNotThisEnd) It's also possible that the method is designed to continue reading from the SocketStream in case it is expecting more data which just hasn't arrived yet (as I think I saw in a method comment somewhere), but this may not make sense when the underlying Socket has been closed and the method keeps reading zero characters from it. Of course, actually, anything is possible. What do I know? :D The rest of this message contains my exploration, debugging session, and further rationale. Thanks, a Tim ------- After running for eight days, Squeak had 300 socket connections open. This is far too many for my very, very unpopular website: $ sudo netstat -a --program | awk '/http.*squeak/ { print $6 }' | sort | uniq -c 297 CLOSE_WAIT 6 ESTABLISHED 1 LISTEN A scan of Processes belonging to WebServer returned over 500, and climbing: Process allSubInstances select: [:proc | proc suspendedContext ifNil: [ false ] ifNotNil: [:context | context closure ifNil: [ false ] ifNotNil: [:closure | | receiver | receiver := closure outerContext receiver. (receiver respondsTo: #outerContext) ifTrue: [ receiver outerContext receiver class == WebServer ] ifFalse: [ false ] ] ] ] Note that none of them are terminated: Bag newFrom: (self collect: [:proc | proc isTerminated]) -> a Dictionary(false->507 true->1 ) Most/all of them seem to be hanging out in this method: method: (BlockClosure>>#newProcess "a CompiledMethod(3871190)") closureOrNil: [closure] in BlockClosure>>newProcess receiver: [closure] in WebServer>>asyncHandleConnectionFrom: It seems that WebServer uses SocketStream on a Socket. So how many SocketStreams exist and where are they? Bag newFrom: (SocketStream allInstances collect: [:ea | ea socket statusString asSymbol]) -> a Dictionary(#invalidSocketHandle->32 #otherEndClosedButNotThisEnd->1 ) So, there's one SocketStream that's closed but we're holding on to it. Why? I opened a Process Browser, turned on CPU Watcher, turned on Auto-Update and found WebServer's listener process eating > 90% CPU. I opened a debugger on it. It seemed it was in an infinite loop centered around WebMessage>>streamDirectlyFrom:to:size:progress: on this very request. [] in SocketStream>>beSignalingWhile: BlockClosure>>ensure: SocketStream>>beSignalingWhile: [] in SocketStream>>next:into:startingAt: BlockClosure>>on:do: SocketStream>>next:into:startingAt: WebRequest(WebMessage)>>streamDirectlyFrom:to:size:progress: WebRequest(WebMessage)>>streamFrom:to:size:progress: [] in WebRequest(WebMessage)>>getContentWithProgress: ByteString class(SequenceableCollection class)>>new:streamContents: WebRequest(WebMessage)>>getContentWithProgress: WebRequest(WebMessage)>>getContent WebRequest(WebMessage)>>content WAWebServerAdaptor>>requestBodyFor: WAWebServerAdaptor(WAServerAdaptor)>>requestFor: WAWebServerAdaptor(WAServerAdaptor)>>contextFor: WAWebServerAdaptor(WAServerAdaptor)>>process: WAWebServerAdaptor>>process: MessageSend>>valueWithArguments: WebServer>>invokeAction:request: Some exploit attempt sending an HTTP POST request to /tmUnblock.cgi and with a Content-Length of 227 was causing WebMessage to lose its mind. (This is when I realized that my nginx proxy configuration might be wrong, and some obvious exploit attempts could be getting through to Squeak. But that's a different matter.) Now I believe any of the following could have been happening: * WebMessage & friends seem to take the supplied Content-Length header in an HTTP request as being literal and true, even though a client could be supplying a wrong or misleading value. I don't know if there are tests against this but I might write one. * If the Socket is closed, the code on the stack trace above will keep trying to read from it. I can't tell if: a) there is an arithmetic problem somewhere, or b) if #ConnectionClosed isn't being raised or caught in the proper way, or c) if there's a logic error in WebMessage>>streamDirectlyFrom:to:size:progress: , or d) this is totally intentional, or e) all of the above, none of the above, or something else. I opted to fix (c). * This portion of WebMessage>>streamDirectlyFrom:to:size:progress: is troublesome to me, and my change seems to make the infinite loop end: [(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < size]] whileTrue:[ In this case, totalRead was always less than size, because the connection was closed and the methods would always return empty strings for what had been read (thus, totalRead = 0). So, [totalRead < size] was always True, which would override the check for [srcStream atEnd not] and it would continue to try reading from a SocketStream on a closed Socket. |
Hi Tim,
On Fri, 19 Jun 2020, Tim Johnson wrote: > Hi, > I've been running a Squeak web server on the wide-open internet for over a year now (WebClient/WebServer/Seaside/nginx proxy). Within a few days it will always stop answering the phone, begin eating ~99% of the CPU, and > eventually segfault (though it has become less segfaulty on this May 2020 VM). I've tried the recommended Unix tweaks for it (file handle limits, etc) but I believe there is something going on inside the image. Today's > issue may be related (at least) to some code in WebMessage. > > tl;dr: > > I'm proposing a change to WebMessage>>#streamDirectlyFrom:to:size:progress: > ...but my proposed change could very well be wrong, or be poor logic. So I'm writing up my lengthy rationale below. > > What do you think about changing: > > [(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < size]] whileTrue:[ as: if no size was specified (sizeOrNil == nil, size = 0), then read while there's data available from the source stream (srcStream atEnd not). If there was a size specified, assume that its the actual number of bytes available, and read that many bytes (totalRead < size). The last line is problematic, as you pointed out in your analysis, when the source stream and the size are both provided by the other party. > > to: > > [sizeOrNil == nil or: [totalRead < size and: [srcStream atEnd not]]] whileTrue:[ The problem with the above is that it'll become and infinite loop if sizeOrNil is nil. So, if you want to read while there's data available from the stream, then I think the following is the right choice: [ srcStream atEnd not and: [ sizeOrNil isNil or: [ totalRead < sizeOrNil ] ] whileTrue: [ The above always takes into account whether there's data available from the stream and directly uses sizeOrNil for size comparison to make it more legible. However, there seem to be a few more things to clean up in that method. Levente > > ? > > > My logic could be wrong, and I don't understand the sizeOrNil check here. But after I made this change, the infinite loop went away, GC destroyed hundreds of stale Sockets and Processes, and my Seaside app began answering > again (note that this is the first time I've been successful at bringing it back to life). > > If the original logic is correct as intended, it seems to be a trade-off between continuing to read from the stream if we haven't yet read #size characters, versus not trying to read from a stream that is #atEnd. My version > breaks out of the loop if the stream is #atEnd even if the method believes there are characters left to read. The reason I think this is worthwhile, is that #totalRead will (endlessly) be 0 if the Socket underneath the > SocketStream is closed, and there are no more bytes to read, thus causing an infinite loop. It is *possible* (and mentioned below) that the design is supposed to catch a #ConnectionClosed exception, but in my case one never > got thrown. (note the Socket's state was #otherEndClosedButNotThisEnd) > > It's also possible that the method is designed to continue reading from the SocketStream in case it is expecting more data which just hasn't arrived yet (as I think I saw in a method comment somewhere), but this may not make > sense when the underlying Socket has been closed and the method keeps reading zero characters from it. > > Of course, actually, anything is possible. What do I know? :D > > The rest of this message contains my exploration, debugging session, and further rationale. > > Thanks, > a Tim > > ------- > > After running for eight days, Squeak had 300 socket connections open. This is far too many for my very, very unpopular website: > > $ sudo netstat -a --program | awk '/http.*squeak/ { print $6 }' | sort | uniq -c > 297 CLOSE_WAIT > 6 ESTABLISHED > 1 LISTEN > > A scan of Processes belonging to WebServer returned over 500, and climbing: > > Process allSubInstances select: [:proc | > proc suspendedContext > ifNil: [ false ] > ifNotNil: [:context | > context closure > ifNil: [ false ] > ifNotNil: [:closure | | receiver | > receiver := closure outerContext receiver. > (receiver respondsTo: #outerContext) > ifTrue: [ receiver outerContext receiver class == WebServer ] > ifFalse: [ false ] ] ] ] > > > Note that none of them are terminated: > > Bag newFrom: (self collect: [:proc | proc isTerminated]) > -> a Dictionary(false->507 true->1 ) > > Most/all of them seem to be hanging out in this method: > > method: (BlockClosure>>#newProcess "a CompiledMethod(3871190)") > closureOrNil: [closure] in BlockClosure>>newProcess > receiver: [closure] in WebServer>>asyncHandleConnectionFrom: > > It seems that WebServer uses SocketStream on a Socket. So how many SocketStreams exist and where are they? > > Bag newFrom: (SocketStream allInstances collect: [:ea | ea socket statusString asSymbol]) > -> a Dictionary(#invalidSocketHandle->32 #otherEndClosedButNotThisEnd->1 ) > > So, there's one SocketStream that's closed but we're holding on to it. Why? > > I opened a Process Browser, turned on CPU Watcher, turned on Auto-Update and found WebServer's listener process eating > 90% CPU. I opened a debugger on it. It seemed it was in an infinite loop centered > around WebMessage>>streamDirectlyFrom:to:size:progress: on this very request. > > [] in SocketStream>>beSignalingWhile: > BlockClosure>>ensure: > SocketStream>>beSignalingWhile: > [] in SocketStream>>next:into:startingAt: > BlockClosure>>on:do: > SocketStream>>next:into:startingAt: > WebRequest(WebMessage)>>streamDirectlyFrom:to:size:progress: > WebRequest(WebMessage)>>streamFrom:to:size:progress: > [] in WebRequest(WebMessage)>>getContentWithProgress: > ByteString class(SequenceableCollection class)>>new:streamContents: > WebRequest(WebMessage)>>getContentWithProgress: > WebRequest(WebMessage)>>getContent > WebRequest(WebMessage)>>content > WAWebServerAdaptor>>requestBodyFor: > WAWebServerAdaptor(WAServerAdaptor)>>requestFor: > WAWebServerAdaptor(WAServerAdaptor)>>contextFor: > WAWebServerAdaptor(WAServerAdaptor)>>process: > WAWebServerAdaptor>>process: > MessageSend>>valueWithArguments: > WebServer>>invokeAction:request: > > Some exploit attempt sending an HTTP POST request to /tmUnblock.cgi and with a Content-Length of 227 was causing WebMessage to lose its mind. (This is when I realized that my nginx proxy configuration might be wrong, and > some obvious exploit attempts could be getting through to Squeak. But that's a different matter.) > > Now I believe any of the following could have been happening: > > * WebMessage & friends seem to take the supplied Content-Length header in an HTTP request as being literal and true, even though a client could be supplying a wrong or misleading value. I don't know if there are tests > against this but I might write one. > > * If the Socket is closed, the code on the stack trace above will keep trying to read from it. I can't tell if: > > a) there is an arithmetic problem somewhere, or > b) if #ConnectionClosed isn't being raised or caught in the proper way, or > c) if there's a logic error in WebMessage>>streamDirectlyFrom:to:size:progress: , or > d) this is totally intentional, or > e) all of the above, none of the above, or something else. > > I opted to fix (c). > > * This portion of WebMessage>>streamDirectlyFrom:to:size:progress: is troublesome to me, and my change seems to make the infinite loop end: > > [(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < size]] whileTrue:[ > > In this case, totalRead was always less than size, because the connection was closed and the methods would always return empty strings for what had been read (thus, totalRead = 0). So, [totalRead < size] was always True, > which would override the check for [srcStream atEnd not] and it would continue to try reading from a SocketStream on a closed Socket. > > > > > |
Hi Levente,
On Jun 20, 2020, at 8:18 AM, Levente Uzonyi wrote: > On Fri, 19 Jun 2020, Tim Johnson wrote: > >> I'm proposing a change to >> WebMessage>>#streamDirectlyFrom:to:size:progress: >> ...but my proposed change could very well be wrong, or be poor >> logic. So I'm writing up my lengthy rationale below. >> What do you think about changing: >> [(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < >> size]] whileTrue:[ > > I read the above line (combined with the assignment to size and > totalRead) as: if no size was specified (sizeOrNil == nil, size = > 0), then read while there's data available from the source stream > (srcStream atEnd not). > If there was a size specified, assume that its the actual number of > bytes available, and read that many bytes (totalRead < size). I did not find any code path in which sizeOrNil might be nil, but I may not have looked hard enough, or perhaps it was there for future expansion, etc. > The last line is problematic, as you pointed out in your analysis, > when the source stream and the size are both provided by the other > party. > >> to: >> [sizeOrNil == nil or: [totalRead < size and: [srcStream atEnd >> not]]] whileTrue:[ > > The problem with the above is that it'll become and infinite loop if > sizeOrNil is nil. > So, if you want to read while there's data available from the > stream, then I think the following is the right choice: > > [ srcStream atEnd not and: [ sizeOrNil isNil or: [ totalRead < > sizeOrNil ] ] whileTrue: [ > > The above always takes into account whether there's data available > from the stream and directly uses sizeOrNil for size comparison to > make it more legible. However, there seem to be a few more things to > clean up in that method. into my server image (running 5.2) and will see how things go. Strangely, the method's author is listed as MCPackageTest...? I've verified that the other methods in the WebMessage 'streaming' category in WebClient-Core-ul.122.mcz are all 'ar' and 'pre' as expected... it is only this one that is attributed to 'MCPackageTest'... hmm. Thanks again, a Tim Picture 10.png (27K) Download Attachment |
Hi Tim,
On Mon, 22 Jun 2020, Tim Johnson wrote: > Hi Levente, > > On Jun 20, 2020, at 8:18 AM, Levente Uzonyi wrote: > >> On Fri, 19 Jun 2020, Tim Johnson wrote: >> >>> I'm proposing a change to >>> WebMessage>>#streamDirectlyFrom:to:size:progress: >>> ...but my proposed change could very well be wrong, or be poor logic. So >>> I'm writing up my lengthy rationale below. >>> What do you think about changing: >>> [(sizeOrNil == nil and:[srcStream atEnd not]) or:[totalRead < size]] >>> whileTrue:[ >> >> I read the above line (combined with the assignment to size and totalRead) >> as: if no size was specified (sizeOrNil == nil, size = 0), then read while >> there's data available from the source stream (srcStream atEnd not). >> If there was a size specified, assume that its the actual number of bytes >> available, and read that many bytes (totalRead < size). > > Ah! Thank you. That makes sense. > > I did not find any code path in which sizeOrNil might be nil, but I may not > have looked hard enough, or perhaps it was there for future expansion, etc. It's used when you send a request to a server and the response does not contain the Content-Length header and the Transfer-Encoding is not chunked. > >> The last line is problematic, as you pointed out in your analysis, when the >> source stream and the size are both provided by the other party. >> >>> to: >>> [sizeOrNil == nil or: [totalRead < size and: [srcStream atEnd not]]] >>> whileTrue:[ >> >> The problem with the above is that it'll become and infinite loop if >> sizeOrNil is nil. > > Oops. > >> So, if you want to read while there's data available from the stream, then >> I think the following is the right choice: >> >> [ srcStream atEnd not and: [ sizeOrNil isNil or: [ totalRead < sizeOrNil ] >> ] whileTrue: [ >> >> The above always takes into account whether there's data available from the >> stream and directly uses sizeOrNil for size comparison to make it more >> legible. However, there seem to be a few more things to clean up in that >> method. > > Thank you! I have loaded WebClient-Core-ul.122.mcz from the Inbox into my > server image (running 5.2) and will see how things go. > > Strangely, the method's author is listed as MCPackageTest...? > Thanks. I had a look at MCPackageTest from the same image I changed the method. MCPackageTest is a subclass of MCTestCase which mangles with authorInitials. It seems the unwind code restoring authorInitials didn't execute. I noticed the strange initials when I saved the package but I didn't think about the methods being affected as well. Levente |
Free forum by Nabble | Edit this page |