Hi guys
During the DSU workshop we were brainstorming about what are the most difficult bugs we faced and what are the conceptual tools that would have helped you. Stef |
For me it was when I was making the CPP library. CPP for those that do not know is a library I made that allows Pharo to control a C++ application. Its a very simple IPC bridge using shared memory mapped files. I was surprised how easy it was to make it from the C++ side Pharo side was a different story. UFFI is brilliantly simple to use BUT, as much as the Pharo rocks for debugging Pharo code it works under one caveat. The caveat being that your code makes no usage of UFFI or any FFI. Because the Pharo debugger has no means to debug UFFI. People may not realize it but practically UFFI is C coding. It make look like your usual Pharo code but make no mistake its C all the way. So the practice is to make sure the code works in C first. However if it works in C but for some strange reason it makes Pharo freak out you are all by yourself. Pharo code that can crash Pharo is the one that needs debugging the most. So to find these bug which fortunately there were not many I had to crash Pharo countless times and move extremely slow and step by step. None the less I success was the final result and I learned a ton in the process. On Thu, Mar 9, 2017 at 1:38 PM Stephane Ducasse <[hidden email]> wrote:
|
In reply to this post by Stephane Ducasse-3
2017-03-09 12:36 GMT+01:00 Stephane Ducasse <[hidden email]>:
Running a RBLintRule modified the (cached) AST of this methods code. So, even if the compiled method did not changed, and the "real" source code did not changed, you actually see the source code from the modified AST. There are a couple of things which went wrong resp. make this bug difficult: 1. A LintRule that just should *check* the code, actually created a transformation somewhere behind the scene (Ok, it is a RBTransformationRule, and originally used for code refactoring as well) 2. As long as we *analyze* code, the real code (string/ast/source) should be considered immutable 3. the modified AST was cached 4. what-you-see-is-not-what-you-get. We are (or I am) used to consider the system browser a tool for view and edit methods source code. If the code we see isn't actually the methods source, but another representation of some kind of (cached) model (formatted AST-node-source code), it would be good to have some way to indicate this (switch between "raw"-code / model-code / ast-node-code). |
In reply to this post by Stephane Ducasse-3
Fixing a race condition in handling open sockets when forking an image. At first I had no clue where the problem could come from, then I spent a lot of time guessing at the conditions (of course, being a race condition there was no means to force the specific condition but I didn't know that yet).
Over all I spent about 6 weeks on this bug and finally fixed it by creating a new primitive to handle that specific case. I'm not sure what tools could have helped me as this was a rather specific problem (OSProcess). But the hardest problems in my experience are usually concurrency / asynchrony (e.g. race conditions) or bugs in libraries (where you always assume that you must have made a mistake, never the library). Max > On 9 Mar 2017, at 12:36, Stephane Ducasse <[hidden email]> wrote: > > Hi guys > > During the DSU workshop we were brainstorming about what are the most difficult bugs we faced and what are the conceptual tools that would have helped you. > > Stef |
We had to fix a Heisenbug. (a bug that never breaks when you look at it) This was a bug that happened only in production, didn't break every time, and never broke while debugging. It also got less frequent the more we added logging to report on the current state when the bug happened. We knew it was a timing issue but exactly what was happening eluded us. The trick eventual was logging. We kept adding and removing logging and mapping the state during the bug until we understood it on paper. Then we created a test that reproduced it, fixed it, and celebrated! It was really difficult! The lesson is that no matter how difficult or infrequent the bug occurs with enough persistence and hard work you can figure it out! All the best, Ron Teitelbaum On Thu, Mar 9, 2017 at 9:38 AM Max Leske <[hidden email]> wrote: Fixing a race condition in handling open sockets when forking an image. At first I had no clue where the problem could come from, then I spent a lot of time guessing at the conditions (of course, being a race condition there was no means to force the specific condition but I didn't know that yet). |
In reply to this post by Stephane Ducasse-3
I don't think my reply will be anything useful, but as to me the most craziest bug is metabug, i.e. when system doesn't provides any means to debug things. :) As for regular bugs .. it is quite hard to remember anything i wasn't able to deal with, given enough time & effort, and then emphasize single case over the rest. And since human brains tend to forget unpleasant things, there's not much details to tell and remember. On 9 March 2017 at 13:36, Stephane Ducasse <[hidden email]> wrote:
Best regards,
Igor Stasenko. |
In reply to this post by Stephane Ducasse-3
Thanks you all. The idea is that we want to see how we cn improve our debugging arsenal. So it is important that your scenario give use some hints. It is difficult to convey what we are really looking for :)
-- Using Opera's mail client: http://www.opera.com/mail/ |
In reply to this post by Stephane Ducasse-3
> On 9 Mar 2017, at 12:36, Stephane Ducasse <[hidden email]> wrote: > > Hi guys > > During the DSU workshop we were brainstorming about what are the most difficult bugs we faced and what are the conceptual tools that would have helped you. Tracking down a problem where a header file was like this struct touch_screen_event { #ifdef SOME_FLAG ... other fields #endif int x; int y; int pressure; }; The touchscreen library was compile with -DSOME_FLAG but the code using that library didn't have the flag set. This means the code using the touchscreen events read x/y from the wrong offset in memory. The example work of the touchscreen library worked while the real user didn't. It would have helped to embed struct sizes and offsets into the shared library to find differences at link time. --- Keyboard handling in kdrive (an Xserver for embedded/mobile usage): After plugging/unplugging USB into the device, the keyboard started to generate wrong keycodes. In Linux (depending on your keyboard mode) every key event is represented as a byte(?). This worked for a long time but then keyboards started to have more keys. So a special key value is used to indicate that a multi byte sequence will follow. As it turns out plugging/unplugging generated a multi-byte keyboard event... Not sure what would have helped? :) |
In reply to this post by Stephane Ducasse-3
we were talking during the workshop and on the simple things to help. - watch points with history of the value. - stop if the value of an instance variable is getting irregular i.e., = you get integer and suddenly inside you get a float! 1 1 1 1 2 3 4 2 2 1 1 12 2 3 32 1 1 1 1 22 1 1 1 1 1 1 1 2 33 1.2 2 3 3 4 4 - start to monitor (put alarm/breakpoints) newly created instances from a point I hope that soon we will have instance specific features of the Ressia debugger in Pharo debuger (stop the next time this object receive a message, stop the next tie this object changes any of its instance variables). |
In reply to this post by stepharong
Figuring out why my image balloons to 500+ megs and is not shrinking despite new compactor was a nice one + fact that it took 2 minutes to start.
Fixed yesterday with the help of Clement and Pavel. First, there was a massive leak of memory from the devtools. https://gist.github.com/philippeback/39c63bb5aa26b79098511cdfea4fea7e fixed it (got the image back to 130 megs, which was ok given the amount of loaded code). Then looking at the startup times to find out the culprit: Save/Stop/Start and then inspect LOG11. This gaves us entry related to FT2Settings talking the lion share of the delay. Looks like this was because it was refreshing fonts at startup. On my Windows box is was 650 font entries. On Ben's Linux this was 1500+. Ben advised to change this try commenting out the last statement... FT2Handle allSubInstancesDo: [ :each | (handleToRelease = each handle) ifTrue: [ each beNull ] ] This indeed made things faster even with the setting. No proof that will be not leading to more trouble b/c this code supports the workaround for FT2Handle duplicates (that FT2 bug alone deserves a cake too by itself). So, things are back to normal now in that image and nothing had to do with my own code. Thx again for the help. And compactor works wonderfully indeed. Note that current figures in the VM Stats are completely bogus at places and need to be rewritten/recomputed since we are using Spur and things are not the way they used to be. I took plenty of notes from what Clement told me. Hope to integrate that into the MemoryMonitor at one point. Phil |
Hi phil are the bug entered in fogbougz? Stef
-- Using Opera's mail client: http://www.opera.com/mail/ |
In reply to this post by stepharong
> On Mar 9, 2017, at 12:50 PM, stepharong <[hidden email]> wrote: > > The idea is that we want to see how we cn improve our debugging arsenal. > So it is important that your scenario give use some hints. > It is difficult to convey what we are really looking for :) I think one of the biggest improvements for the debugger would be for it to know the difference between my code and the system code (or not my code). Almost always, I don’t want to step through any system code while debugging my code. I’m expert enough to generally know when I should use into, over, or through, but it is a real pain to keep moving the mouse from one button to another. Often I end up hitting the wrong button by accident and have to restart the debugging process. I want a step until you come back to my code button. I think such a change would help newbies too. I don’t know how much time I spent debugging though methods in OrderedCollection when I was learning Smalltalk. In over 25 years, I still haven’t found a bug while stepping through OrderedCollection>>do:. Another item that I’ve thought about but have never implemented is a learning debugger. Often the hard things to debug are inside loops where you have to continuously switch between over, into and through. For example, I may know that a method is a lazy accessor and not want to step into it. However, I may want to step into the next message. A learning debugger could keep track of what I did before and do it as default the next time. Or, maybe there could be a little switch on every message send that said if you wanted over, into or through. There could also be a keep going option for all expressions. For example, I generally don’t want to step through a bunch of literal assignments (e.g., “x := 4”). Such a change to the debugger would help eliminate pair debugging sessions whose conversations go like: “over, over, into, over, into…”. Such conversations don’t find bugs, but are necessary to get to the location where the bug may exist. One thing that you may want in your debugging arsenal is lightweight classes to implement per object breakpoints/watchpoints. For example, you may want to set a breakpoint in a text widget. However, since that text widget is used all over, it would crash the system if you just set a breakpoint in the text widget class. Instead you can create a lightweight class with the breakpoint and change the particular text widget to use the lightweight class. This way it can trigger the breakpoint without crashing the image. I don’t use lightweight classes often, but in certain cases they are really handy (and almost necessary). John Brant |
Free forum by Nabble | Edit this page |