Hello all,
One of my apps does a lot of serial communications; we even had to shop around for ways to add more serial ports for at least a couple of installations. The evolving picture is that things work fine over native serial ports, and don't do so well over the added ports. Conclusive testing has been complicated by a lack of excess hardware (slowly getting fixed) and a chaotic environment in which equipment is sometimes disconnected at no extra charge :) We tried and "fired" one card vendor; despite the chaos, we were able to get clear evidence that sending data over their card would lock up a Win9x machine in short order. In contrast, that same machine running over native serial ports goes for weeks at a time as long as its power chords are left in place. That card's vendor seemed completely unwilling and/or unable to help, so enter card vendor number two. I am working on a similar test with the new card; so far, I've demonstrated failure with the card and will soon switch to trying to show success on native ports. At first, I'll leave the card in place and simply not use it. This vendor was kind enough to loan me an extra card to make the the testing easier. The failure mode of the first card on Win9x was a complete vapor lock of the machine: frozen screen, complete with visible but non moving mouse cursor, and the only recourse to reset. The new card on Win9x does a little better; the app will freeze, but it's possible to interact with the machine and, among other things, run a debug viewer. The machine eventually locks up when my app is terminated via the task list, but, it's a big improvement over "Don't know, it just quit :(". When the app hangs on Win9x with the new card, the output on the debug viewer is a rapid stream of messages from the card's driver. Of course, that doesn't necessarily mean that it caused the problem. This condition can arise almost any time; it seems to take several hours to a couple of days, though I've seen it happen as soon as two hours. The new card on NT has a different failure mode: a blue screen. So far, this hasn't happened sooner than two days, and has taken up to a week to appear. One of our ongoing studies requires lots of serial ports and reliable data collection. We've been able to get that by running on NT and simply remembering to reboot before each case. There are some indications that the system gets "tired" after a couple of days; it's hard to explain, so I won't try until I can quantify it. If there's anything to it, it might be due to memory or resource leaks, either in my code or the driver. One theory that I've kicked around is that the Win9x rapid failures could be the fault of the drivers, and the longer term NT meltdown might be caused by some kind of starvation that could turn out to be my doing. When I refer to days and weeks, it's important to keep in mind that it's unclear whether the time interval is important, or whether two days or one week was simply the time required to get to a certain number of connect/disconnect cycles on one of the devices, or to reach some other threshold. There's probably more that I should say, but, I can't think of it right now. Suggestions for debugging tools and/or strategies would be most appreciated. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
Does your app make much use of COM? I've got an app (written in VB)
which makes heavy use of COM, and is very unreliable on Win9x but runs fine on NT. NT - in my experience applications, no matter how badly written or ill- behaved, don't crash NT unless you're intentionally using some of the extremely low-level stuff in the Driver Development Kit, and my guess is that you're not. If you're getting a BSOD (Blue Screen Of Death) it's very likely a driver problem. What's the failure cause listed on the BSOD? IRQL_NOT_LESS_THAN_OR_EQUAL or something like that? What's the PC at when the machine goes down, and what's in the module list shown on the BSOD? That's the first thing (and sometimes the ONLY thing) you should look at. Also, make sure the machine is set up to record a crash dump. To do this, go to Control Panel, open the Services applet, click on the Startup/Shutdown tab, and make sure the "Write debugging information to:" checkbox is checked, and that there's a filename (commonly %SystemRoot%\MEMORY.DMP) in the text box. Sent via Deja.com http://www.deja.com/ |
Bob,
> Does your app make much use of COM? I've got an app (written in VB) > which makes heavy use of COM, and is very unreliable on Win9x but runs > fine on NT. It makes some use of COM, but, not much. It's no different in that respect than any of the other apps that work fine. > NT - in my experience applications, no matter how badly written or ill- > behaved, don't crash NT unless you're intentionally using some of the > extremely low-level stuff in the Driver Development Kit, and my guess > is that you're not. Not directly at least. I do open and close serial ports, but, that's all through the Windows API. > If you're getting a BSOD (Blue Screen Of Death) > it's very likely a driver problem. What's the failure cause listed on > the BSOD? IRQL_NOT_LESS_THAN_OR_EQUAL or something like that? Sadly, I don't remember. But, another one will appear :( and a I'll take note. > and what's in the module list > shown on the BSOD? I recorded this on one of the early crashes, and the vendor seemed to think it was their problem, and were not terribly interested in the text. What they really want to do is reproduce in their lab; but, my suspicion is that there's no way to generate the failure w/o the external devices and lots of other stuff connected to them. Interestingly, they wanted to go after the problem in the Win9x driver first because it's easier to reproduce. > That's the first thing (and sometimes the ONLY > thing) you should look at. Also, make sure the machine is set up to > record a crash dump. To do this, go to Control Panel, open the > Services applet, click on the Startup/Shutdown tab, and make sure > the "Write debugging information to:" checkbox is checked, and that > there's a filename (commonly %SystemRoot%\MEMORY.DMP) in the text box. Thats a new one! Thanks! Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Bill Schwab
"Bill Schwab" <[hidden email]> wrote in message
news:949p1k$49e$[hidden email]... > The evolving picture is that things work fine over native > serial ports, and don't do so well over the added ports. This might be related to the add-on ports using their own drivers as opposed to the standard drivers. I've done some low-level serial port coding (in my DOS days) and there are some tricky squences involved to guarantee nothing will go wrong. There are several FAQs on the net about serial ports that go into this, not that that would do much good unless you were working on the driver code directly. It is just hell troubleshooting something where you have to wait 2 days to 2 weeks for the problem to manifest itself. Eliminating that delay is one of my favorite things about the Linux "stress test" I've mentioned earlier for testing PC hardware. Would there be anyway to set up a serial port "stresser"? Perhaps set up a machine that all it does is send a particular sequence (or a variety of sequences) over and over to the target machine? The point would be to get a quicker yes or no answer as to whether the serial port was working and whether a change you make has an effect on the problem. Well, there a lots of reasons why the following won't work in your situation, but I'll toss them out just in case they help or lead to another idea: Use a separate machine as a serial port server. Stuff it with ports and let it communicate to the main processing machine which would need only one serial port (or ethernet, etc.). This serial port server might even be a very low-end machine. It could run DOS. Or, above but use Linux on the serial port server. Either of these could be done with no or very low disk space (just a floppy) and modest RAM for Linux (say 4 MB, but 8 MB is better) or 1 MB on a DOS machine. Run the serial ports on the Windows machine in a DOS box so they talk to the hardware more directly (use DOS drivers where perhaps the interrupt enabling/disabling sequences are better solved and tweakable by you if necessary (I can supply some serial port code if it helps, if you wind up working at the DOS level)). This might work better on a W9x machine than on NT. Then, figure how to communicate between the DOS serial port front-end and your main Dolphin application (and tell me how you do it!). (Various multi-serial port vendors advertise to the Linux community. I wonder if any of those cards work more reliably than the ones you been using? Although, it is probably entirely a matter of the drivers and not the actual serial ports causing the problem.) Well, that's all that comes to mind. Good luck. I'll look forward to reports of how it all resolves. -- Frank [hidden email] |
Free forum by Nabble | Edit this page |