Hello all,
Acting on some good advice offered in this group, I've been working on a problem I described earlier with system lockups when using a board that adds serial ports. Among lots of little steps, there is one very noteworthy thing happening: an ME box is (so far) working! It's been up for over 50 hours now - not unprecedented on 9x, but, HIGHLY unusual. I'm gradually cycling 9x boxes through a torture-test in my office and "active duty"; it's slow because of safety rules and logistics in general. Let's assume for a moment that something about ME was the answer. Then the one thing in all of this that _really_ bothers me is the relative fragility of the NT machines that I've been (fairly recently) using. The one doing the same job as the ME box fails (blue screen) every couple of days, but, it's predictable, and serves well as long as we reboot it before we really need it. Another NT box, doing a different job, blue screened on me not too long ago. Watching the ME box do as well as it has so far has me wondering whether the NT machines are properly patched. What should I have installed in the way of service packs? Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
Bill
You wrote in message news:989bvg$12e8n$[hidden email]... > > Acting on some good advice offered in this group, I've been working on a > problem I described earlier with system lockups when using a board that adds > serial ports. Among lots of little steps, there is one very noteworthy > thing happening: an ME box is (so far) working! It's been up for over 50 > hours now - not unprecedented on 9x, but, HIGHLY unusual. I'm gradually > cycling 9x boxes through a torture-test in my office and "active duty"; it's > slow because of safety rules and logistics in general. > > Let's assume for a moment that something about ME was the answer. Then the > one thing in all of this that _really_ bothers me is the relative fragility > of the NT machines that I've been (fairly recently) using. The one doing > the same job as the ME box fails (blue screen) every couple of days, but, > it's predictable, and serves well as long as we reboot it before we really > need it. Another NT box, doing a different job, blue screened on me not too > long ago. I would still take that as an indication of problems with device drivers. NT can certainly run reliably (I used to run it continuously for weeks on end, although Andy liked to reboot regularly especially when there was a Y in the day :-)), but it is very easy to compromise it with poor drivers as essentially run as if they are part of the OS and if they misbehave then they can trample over its memory. One can't blame Microsoft for the poor drivers, although one can blame them for the design. Under Win2K they have addressed this to some extent by having driver signing and certification, and it certainly seems to have helped a great deal. > > Watching the ME box do as well as it has so far has me wondering whether the > NT machines are properly patched. What should I have installed in the way > of service packs? The ME box won't be sharing the same device drivers, so if that is indeed the problem then perhaps the ME ones are better for the h/w your are using. Regarding service packs, I think SP6a is the latest, but as of SP3 onwards it was pretty stable. Regards Blair |
"Blair McGlashan" <[hidden email]> wrote in message
news:98bel1$18au3$[hidden email]... > Bill > > You wrote in message news:989bvg$12e8n$[hidden email]... > > > > Acting on some good advice offered in this group, I've been working on a > > problem I described earlier with system lockups when using a board that > adds > > serial ports. Among lots of little steps, there is one very noteworthy > > thing happening: an ME box is (so far) working! It's been up for over 50 > > hours now - not unprecedented on 9x, but, HIGHLY unusual. I'm gradually > > cycling 9x boxes through a torture-test in my office and "active duty"; > it's > > slow because of safety rules and logistics in general. > > > > Let's assume for a moment that something about ME was the answer. Then > the > > one thing in all of this that _really_ bothers me is the relative > fragility > > of the NT machines that I've been (fairly recently) using. The one > > the same job as the ME box fails (blue screen) every couple of days, but, > > it's predictable, and serves well as long as we reboot it before we really > > need it. Another NT box, doing a different job, blue screened on me not > too > > long ago. > > I would still take that as an indication of problems with device drivers. I too suggest that the problem is either a device driver or something related to a device driver such as a bad registry setup. I've been using Win95/NT (for NT stabilility) followed by Win98 (for its explorer UI) and then Win2K (for both stability and UI -- Win2k is distinctly superior to its predecessors) for many years now. I don't play computer games, but I'm told that with the advent of DirectX8 for Win2k you can now play most games on it -- which had apparently been a major reason for sticking with Win98. Our primary Win2K domain servers get rebooted maybe once every six months or year when we add service packs or upgrade to the next version. We install little or nothing on the domain servers other than required/stock elements for IIS etc -- we especially avoid installing things like MS-Office components or ANY Outlook server stuff both of which, in my experience, have a marked affect on stability. OTOH, my dev boxes often need rebooting every week or two because of some spoo that was caused by: o a device driver o windows media services o visual studio debugger or intellisense scanner runs amok and dies leaving hooks everywhere o some stupidly written installer that "thinks" it requires a reboot. ** For the paranoid or those with painful install related woes/experience. When installing new software, it is wise to reboot your machine and *then* install the new software as the *first* thing you do after: o your machine has rebooted o you've logged on o you've opened the task manager and observed that the CPU indicator is steadily saying that the "System Idle Process" is running at 99%. I.e., you machine is now in steady state idle mode and nothing else is happening. o if you're really paranoid then here are some additional thoughts: o back up your registry and system directory before any install o partition your drive to have an emergency mini (1GB partition) Win2K/NT system you can use to repair your primary system. o organize your boot drive to be 2-4GB so you can perform partition backups of the system before installing new and possibly suspicious drivers. Then get yourself a copy of BOTH "Partition Magic" and "Partition Commander". ** In other words, it is much faster to copy and restore a partition than to try and repair a system. Or worse, have to re-install and recover it. Since big drives are so inexpensive these days, there is little reason not to have a small system partition and safety backups. Furthermore, Win2K NTFS supports mount points and hard links just like unix so even if you have physically different drive partitions you can make it appear to all sfotware as if it was just one big logical (single letter) file system. ** One more thing, if your application is COM based and is using apartment threading you can also hang. This was an area that I had to put special code into the AOS VM threading system to address in our multi-threaded smalltalk debugger for SmalltalkAgents. My first ugly experience with this was in 1996 while working on QKS Smalltalk for Win32 (SmalltalkAgents) I would regularly have my Eudora Mail package or Internet Explorer apparently hang when a QKS Smalltalk thread was suspended -- which lead me to deep kernel debugging and SoftICE to figure out the problem. In my direct experience, it has been a historical problem on all Win32 versions [noting that Win2000 is specially architected to minimize the issue]. I see this problem on my current Win2K dev box nearly every day or two when I launch internet explorer 5.5 with .net spoo and visual studio 7 crap installed. For some reason I've not bothered to chase down, IE will sometimes leave a zombie process around when you close its last window. That zombie process causes COM hanging so that when I save my QKS Smalltalk environment/image it will hang until I kill the IE process. At which instant, the QKS Smalltalk process continues running again as if nothing had happened. Note, this is on Win2K where Microsoft has worked hard to minimize the hanging problem. On any other Win32 system it is much worse. But on Win2K it seems to affect the COM Initialization but not necessarily a COM component. I'm am not aware of Microsoft ever documenting or commenting on this problem. For many years, until Win2K it was my "fun" demo of how to bring any (supposedly process safe NT) Win32 box to its knees. I.e., You could build an errant thread that initialized itself as COM apartment threaded thread and then suspended itself. Whereupon, many aspects of the machine (such as any use of IE or an in-proc web-browser-view) would suddenly hang ;-). Adjunct Comment: "Windows Task Manager" is your friend for killing such applications. ================ The problem is that cross-thread COM calls to apartment threaded components must go through the thread message send system to make their calls. Apparently they do so in an apparently stupid broadcast mode (until Win2000 where it is unclear what they're doing). Basically once a thread gets com/ole initialized, there is a secret/private/invisible/hidden window that COM installs to manage message sends. If the thread doesn't process messages in a timely fashion this com-window doesn't get serviced. Which means that anything which sends it a intra-thread message will hang. ** the rules/mechanism is a little different on Win2k but the basic problem still exists ** The net result is that if you have any non-worker COM initialized threads [i.e., threads with an event queue] that are not performing an event loop then they block ALL cross-thread/intra-process COM activity for a CALLING component. That said, the EVILLY BAD code in RichEdit (1/2/3) among other crap, causes any thread in which it has a window, to become a COM apartment-model initialized thread. So if you have some COM thread that is not responsive, anywhere in your system, and you have a rich edit component (window) for a given (smalltalk) thread, it will result in that thread appearing to be frozen/dead. Generally speaking, if you have any activity that results in a threaded com call where some com-enabled thread in your entire system is not responding (looping) in a timely fashion processing messages then ALL apartment threaded components are at risk. -- Dave Simmons [www.qks.com / www.smallscript.com] "Effectively solving a problem begins with how you express it." > NT > can certainly run reliably (I used to run it continuously for weeks on end, > although Andy liked to reboot regularly especially when there was a Y in the > day :-)), but it is very easy to compromise it with poor drivers as > essentially run as if they are part of the OS and if they misbehave then > they can trample over its memory. One can't blame Microsoft for the poor > drivers, although one can blame them for the design. Under Win2K they have > addressed this to some extent by having driver signing and certification, > and it certainly seems to have helped a great deal. > > > > > Watching the ME box do as well as it has so far has me wondering whether > the > > NT machines are properly patched. What should I have installed in the > > of service packs? > > The ME box won't be sharing the same device drivers, so if that is indeed > the problem then perhaps the ME ones are better for the h/w your are using. > Regarding service packs, I think SP6a is the latest, but as of SP3 onwards > it was pretty stable. > > Regards > > Blair > > |
I forgot to mention that another significant source of Windows box
instability is mixing memory with different timing characteristics (or just poor qaulity memory from different manufacturers). Depending on the manufacturer and bios settings of your motherboard this can lead to instability which is much more exposed under an NT/2K kernel. I was badly bitten by this problem with WinNT 3.5 and 4.0 early on. -- Dave Simmons [www.qks.com / www.smallscript.com] "Effectively solving a problem begins with how you express it." |
In reply to this post by David Simmons
David Simmons wrote:
<big snip> > I'm am not aware of Microsoft ever documenting or commenting on this > problem. OK, this is *way* off topic, but I encountered another Microsoft problem for which I've never read documentation. The format of Word 97 documents is dependent on the resolution at which you print them. Change the resolution, and formatting properties such as tab alignment, word wrap and pagination may change. This happened to me recently, and two friends confirmed that it happened to them. Just take any heavily formatted document you have, go to the appropriate printer dialog, change the resolution from, say, 600 to 300 dpi, and watch what happens. I suspect their formatting algorithm handles round off errors in a brain damaged way. The net result is that documents can become misformatted with printer upgrades, and collaborative work on documents leads to subtle errors. Send your resume to a recruiter, and the format can change. I encountered this error for Word 97 and I think Excel 97 on Win NT. I don't know whether it exists in newer MS products. |
In reply to this post by David Simmons
Dave,
> I forgot to mention that another significant source of Windows box > instability is mixing memory with different timing characteristics (or just > poor qaulity memory from different manufacturers). Depending on the > manufacturer and bios settings of your motherboard this can lead to > instability which is much more exposed under an NT/2K kernel. > > I was badly bitten by this problem with WinNT 3.5 and 4.0 early on. Interesting. I'll keep this one in mind! Re your COM hanging suggestion, I wish you had been around a couple of years ago :) I (almost certainly) ran into that problem with a commerical app that managed to hang my apps; the details are a blur now, but, we were able to get around it by installing an update to the commercial app. Thanks! Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Blair McGlashan
Blair,
> I would still take that as an indication of problems with device drivers. I've made some progress toward reproducing the problem outside of a thoracic OR. A Win95 machine choked up while trying to read from an emulator for one of the monitors. It took days to happen, but, I turned up the pace (a lot) to see if the failure will happen sooner. If I can make something that turns ugly, the vendor will be able to (and I believe will) turn their hardware debuggers on the problem. > NT > can certainly run reliably (I used to run it continuously for weeks on end, > although Andy liked to reboot regularly especially when there was a Y in the > day :-)), but it is very easy to compromise it with poor drivers as > essentially run as if they are part of the OS and if they misbehave then > they can trample over its memory. One can't blame Microsoft for the poor > drivers, although one can blame them for the design. Under Win2K they have > addressed this to some extent by having driver signing and certification, > and it certainly seems to have helped a great deal. It probably is drivers, but, if it is my fault, I would expect either: (1) memory/resource leak; (2) threading. In my early days with making serial port calls from background threads, I caused some ugly system crashes (on Win95) by having two threads that felt responsible for closing the port. That was fixed easily enough by closing from an #ensure: block in one of the threads. > > Watching the ME box do as well as it has so far has me wondering whether > the > > NT machines are properly patched. What should I have installed in the way > > of service packs? > > The ME box won't be sharing the same device drivers, They won't share with NT, but, they would with other 9x machines (is that correct??). A secondary concern of mine is that the ME boxes are doing a lot better than the Win95 machines; but, ... > so if that is indeed > the problem then perhaps the ME ones are better for the h/w your are using. this is perhaps all the more true for the Win95 machines. > Regarding service packs, I think SP6a is the latest, but as of SP3 onwards > it was pretty stable. Thanks! Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
Free forum by Nabble | Edit this page |