The four byte bug lives

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
72 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

The four byte bug lives

Bill Schwab-2
Hello all,

Remember the binary filing problem (typically showed up when saving view
resources) that corrupts some/all of the first four bytes of a class name,
and was (in my limited experience with it) fairly common on NT?  I've now
seen it (with STB, not in the VC) on win2k sp2.  Any ideas?  A reproduceable
case would be terrific :)

Have a good one,

Bill

--
Wilhelm K. Schwab, Ph.D.
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Christopher J. Demers
Bill Schwab <[hidden email]> wrote in message
news:ao2cqm$hs79j$[hidden email]...
>
> Remember the binary filing problem (typically showed up when saving view
> resources) that corrupts some/all of the first four bytes of a class name,
> and was (in my limited experience with it) fairly common on NT?  I've now
> seen it (with STB, not in the VC) on win2k sp2.  Any ideas?  A
reproduceable
> case would be terrific :)
>

Is this in Dolphin XP or 4.0, EXE or development image?  This distresses me
as I will soon be releasing an application that uses a fair amount STB.  I
may have to replace STB if I can't trust it.  Is the corruption you are
seeing still just the first four bytes of the class name or something worse?
At least that is detectable and probably recoverable.

If you have found a machine that causes these corruptions you might write a
loop to write out to STB and read it back.  You could see how often it
happens in the loop.  I did this in Dolphin 4.0 on an NT machine.   I used
the code bellow to open and save a view in a loop.  I ran it 100 times, and
once the STB was corrupt.  Then I ran it again for another 100 times and
twice it was corrupt.  Then I could not get it to happen again after running
the code a few hundred more times.  However I bet it would happen again if I
let it run long enough.  This code logs successful loads as well as
failures.

=============
errors := OrderedCollection new.
rcName := 'Default view'. rcClassName := ClassName.
100 timesRepeat: [
 [rc := (ResourceIdentifier class: rcClassName name: rcName) resource load.
 testView := ViewResource inSTBByteArray
  save: rc;
  loadWithContext: View desktop.
 rc close.
 errors add: (TimeStamp current -> #ok)]
   on: Error do: [ :error | errors add: (TimeStamp current -> error)]].

(errors select: [ :ass | (ass value = #ok) not]) size
==============

Actually now that I look back at my 4.0 image on NT I see that the STB
corruption not only mangles class names, but apparently can corrupt any
string as I have a tab text property in my class browser that became corrupt
after I resaved the view.  It always seems to be the first four bytes.

I think the STB corruption is caused by a semi-random error.  It seems to be
more common on some computers than others.  I have never witnessed an STB
corruption on my W2K machine, but it occurs frequently experienced it on my
WINNT machine.

I wonder if OA could stress test the STBing and reconstituting of a complex
series of objects to see if they can reproduce this error.  Perhaps some
code could be run a few thousand times overnight.  I would feel much more
comfortable using STB if I had a better understanding of this problem.

Chris


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Andy Bower
Christopher,

> If you have found a machine that causes these corruptions you might write
a
> loop to write out to STB and read it back.  You could see how often it
> happens in the loop.  I did this in Dolphin 4.0 on an NT machine.   I used
> the code bellow to open and save a view in a loop.  I ran it 100 times,
and
> once the STB was corrupt.  Then I ran it again for another 100 times and
> twice it was corrupt.  Then I could not get it to happen again after
running
> the code a few hundred more times.  However I bet it would happen again if
I
> let it run long enough.  This code logs successful loads as well as
> failures.
>
> =============
> errors := OrderedCollection new.
> rcName := 'Default view'. rcClassName := ClassName.
> 100 timesRepeat: [
>  [rc := (ResourceIdentifier class: rcClassName name: rcName) resource
load.
>  testView := ViewResource inSTBByteArray
>   save: rc;
>   loadWithContext: View desktop.
>  rc close.
>  errors add: (TimeStamp current -> #ok)]
>    on: Error do: [ :error | errors add: (TimeStamp current -> error)]].
>
> (errors select: [ :ass | (ass value = #ok) not]) size
> ==============

I've tried your test, so far on Win XP only and with D5. If I try with a
classname of ClassBrowserShell (a complex) view then I get 38 failures at
the end of the 100. These, however, are caused by Windows running out of
resources probably because the finalization hasn't been allowed to run in
the loop. What errors are you seeing in your log? I've run a simpler view
(Scribble) 5000 times with no errors.

Also can you let us know if this problem is occurring with D5 or only D4 and
whether anyone has seen it on operating systems other than NT. I'll try your
test on NT later (when I can remember the bloomin' administrator password).

Best Regards,

Andy Bower
Dolphin Support
http://www.object-arts.com
---
Are you trying too hard?
http://www.object-arts.com/Relax.htm
---


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Andy Bower
Folks,

> whether anyone has seen it on operating systems other than NT.

Sorry, just re-read Bill's post and see that it also goes wrong on Win2K.
Bill, is this D4 or D5?

Best Regards,

Andy Bower
Dolphin Support
http://www.object-arts.com
---
Are you trying too hard?
http://www.object-arts.com/Relax.htm
---


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Chris Uppal-3
In reply to this post by Andy Bower
Andy,

> I've tried your test, so far on Win XP only and with D5. If I try with a
> classname of ClassBrowserShell (a complex) view then I get 38 failures at
> the end of the 100. These, however, are caused by Windows running out of
> resources probably because the finalization hasn't been allowed to run in
> the loop.

I just tried the CHB loop on W2ksp3.  Same pattern of failures, but I
noticed that Windows Task Manager's idea of how many GDI objects were in use
climbed to over 3200 *and hasn't dropped back again*.  I've killed the
window where I ran Christopher's loop, and saved the image a couple of times
(to force full GC and finalisation), but it still hasn't dropped back.

FWIW, I seem to have around 5700 instances of Menu in my image now, and they
are holding on to some 2400 instances of ExternalHandle in their 'handle'
instvar.

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Chris Uppal-3
Following myself up:

> FWIW, I seem to have around 5700 instances of Menu in my image now, and
they
> are holding on to some 2400 instances of ExternalHandle in their 'handle'
> instvar.

Saving and restoring the image still is holding on to those resources; they
seem to be being kept alive by the input manager's window list:

    is := SessionManager current inputState.
    is windows size. --> 2708
    is topLevelWindows size. --> 74
    (is windows select: [:each | each asParameter notNil]) size. --> 2708

Now to go clean them up...

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Bill Schwab-2
In reply to this post by Andy Bower
Andy,

> > whether anyone has seen it on operating systems other than NT.
>
> Sorry, just re-read Bill's post and see that it also goes wrong on Win2K.
> Bill, is this D4 or D5?

No need to appologize - you're taking it seriously, which is the first step
toward fixing it.  Thanks.

It happened most recently with deployed apps from D5 running on 2k sp2.  I
believe it also happened with D4 apps on the same machines; I'll try to dig
through some log files to verify that.

I first noticed the problem with D4 on NT, and it was sufficiently common
that I installed 2k as a hot fix to be able to keep working.  At first it
seemed that 2k fixed the problem, but it now appears that it merely makes it
far less likely to occur.  There is always a chance that it's my fault, but
since others reported the same behavior on NT, I'm inclined to think
otherwise.

Put another way, I thought 2k sp1 was the fix, then I thought 2k sp2 was the
fix, now I have evidence that it can even happen on sp2.

It's possible that it exists outside of the binary filer, but it would be
difficult to tell since most of my FileStream or socket I/O includes STB
data..  But the reports from other users have (I think) always centered on
VC saves of view resources which are STB filed into byte arrays in the
image.  The only other really huge collection of Dolphin-written files I
have are text log files.

Have a good one,

Bill

--
Wilhelm K. Schwab, Ph.D.
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Bill Schwab-2
In reply to this post by Christopher J. Demers
Chris,

> Is this in Dolphin XP or 4.0, EXE or development image?

D5, EXE, win2k sp2.

> This distresses me
> as I will soon be releasing an application that uses a fair amount STB.  I
> may have to replace STB if I can't trust it.  Is the corruption you are
> seeing still just the first four bytes of the class name or something
worse?

First four bytes of the class name.  Ok, there _could_ be further corruption
down the line, but the binary filer bangs out after the corrupted class
name, so that's what I see in the error dumps.

> At least that is detectable and probably recoverable.

Agreed.  But the most recent episodes at least seem to require rebooting the
machine to fix it.


> If you have found a machine that causes these corruptions you might write
a
> loop to write out to STB and read it back.

Unfortunately the box in question is a little difficult to take out of
service, though I might be able to sneak another in its place in a week or
two.


> Actually now that I look back at my 4.0 image on NT I see that the STB
> corruption not only mangles class names, but apparently can corrupt any
> string as I have a tab text property in my class browser that became
corrupt
> after I resaved the view.  It always seems to be the first four bytes.

Yup - I think it is invariably in the first four bytes of a string (maybe
symbols too).  I do not recall seeing the corruption outside of the first
four bytes.


> I think the STB corruption is caused by a semi-random error.  It seems to
be
> more common on some computers than others.  I have never witnessed an STB
> corruption on my W2K machine, but it occurs frequently experienced it on
my
> WINNT machine.

And that might be the answer.  Since we seem to agree that NT is better at
exhibiting the problem, that's probably where we should be stress testing.
I have a machine downstairs that might be about to get reformatted with NT;
I'll have to see what's on it and how much memory it has.

Possible explanations: 2k has better memory management than NT, so could
this be something in the streaming primitives that gets confused by
(re)allocation?  Maybe an incorrect type cast??  Gremlins<g>???

My most recent observation was fixed by rebooting the offending machine.  I
was under some pressure to fix it, so I can't claim to have tried everything
along the way.  A previous encounter (perhaps on 2k sp1 though, and on a
different machine) also seemed to require a reboot to fix.  Could the
machines be suffering some kind of heap fragmentation causing more active
memory management and therefore making the problem more likely to appear???

Ok, I've exposed far too much of my extensive ignorance on the subject :)

Have a good one,

Bill

--
Wilhelm K. Schwab, Ph.D.
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Christopher J. Demers
In reply to this post by Andy Bower
Andy Bower <[hidden email]> wrote in message
news:3da567f8$[hidden email]...
>
> I've tried your test, so far on Win XP only and with D5. If I try with a
> classname of ClassBrowserShell (a complex) view then I get 38 failures at
> the end of the 100. These, however, are caused by Windows running out of
> resources probably because the finalization hasn't been allowed to run in
> the loop. What errors are you seeing in your log? I've run a simpler view
> (Scribble) 5000 times with no errors.

I was using a simple view (with no menus I think).  The error I got was
about not being able to find Dolphin.pak.  Basically what happens is the
dynamic class loading mechanism tries to load the corrupt class name.  For
example in one case IdentityDictionary becomes ´j¢tityDictionary.

> Also can you let us know if this problem is occurring with D5 or only D4
and
> whether anyone has seen it on operating systems other than NT. I'll try
your
> test on NT later (when I can remember the bloomin' administrator
password).

I personally only see this error on a Windows NT machine running Dolphin
4.0.  However it sounds like Bill ran into this on D5 on W2K.  It was
corrupting my views so often I had to stop using the Dolphin development
environment on NT.  I can't install Dolphin 5.0 on this machine because the
wonderful MSI file demands a higher service pack than I have before it will
even dare to let me attempt to install the upgrade.  I am running Windows NT
version 4.00.1381 (SP4 I think).  Dolphin 5.0 wants SP6 I think.  In the
past Windows NT SP's have rendered systems unusable (for me at least, it is
a bit like Russian Roulette) and I need to keep this machine running and
compatible with some rather old programs and code right now.

I shall assume you won't mind if I send you a small example of a corrupt stb
file via e-mail.

I really appreciate that you are looking into this issue.

Chris


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Bill Schwab-2
Chris,

> > I've tried your test, so far on Win XP only and with D5. If I try with a
> > classname of ClassBrowserShell (a complex) view then I get 38 failures
at
> > the end of the 100. These, however, are caused by Windows running out of
> > resources probably because the finalization hasn't been allowed to run
in
> > the loop. What errors are you seeing in your log? I've run a simpler
view
> > (Scribble) 5000 times with no errors.
>
> I was using a simple view (with no menus I think).  The error I got was
> about not being able to find Dolphin.pak.  Basically what happens is the
> dynamic class loading mechanism tries to load the corrupt class name.  For
> example in one case IdentityDictionary becomes ´j¢tityDictionary.

That's the critter :)


> I personally only see this error on a Windows NT machine running Dolphin
> 4.0.  However it sounds like Bill ran into this on D5 on W2K.

Yes.  It was seeing on win2k sp2 that _really_ got my attention; prior to
that I had thought sp2 elminated the bug.  But again, I would encourage
using the most fragile environment possible to make it easier to reproduce
the problem.  A conditional breakpoint in the depths of the VM (or maybe
even in the image itself) will hopefully then get the problem to reveal
itself.  The D4/NT fix will probably fix D5/2k too.


>  It was
> corrupting my views so often I had to stop using the Dolphin development
> environment on NT.

Ditto.  PenWindows issues had me stuck on 9x for a long time, but when that
cloud lifted, I switched to an NT box for development and very quickly put
Win2k on it because of this problem.


> I can't install Dolphin 5.0 on this machine because the
> wonderful MSI file demands a higher service pack than I have before it
will
> even dare to let me attempt to install the upgrade.  I am running Windows
NT
> version 4.00.1381 (SP4 I think).  Dolphin 5.0 wants SP6 I think.  In the
> past Windows NT SP's have rendered systems unusable (for me at least, it
is
> a bit like Russian Roulette)

How about Redmond Roulette??  Sorry, couldn't resist.  I hear you about
updates in general though, and Microsoft's new idea of automatic updates
scares the DLL out of me - it's just plain unacceptable.  With that said, I
think OA is wise to at least have the installer warn about service packs,
based on things I've been told re NT.  I didn't use NT long enough to claim
experience of my own.


> I really appreciate that you are looking into this issue.

Likewise.  Thanks!!!

Have a good one,

Bill

--
Wilhelm K. Schwab, Ph.D.
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Christopher J. Demers
Bill Schwab <[hidden email]> wrote in message
news:ao4r37$jes1g$[hidden email]...
>
> How about Redmond Roulette??  Sorry, couldn't resist.  I hear you about
> updates in general though, and Microsoft's new idea of automatic updates
> scares the DLL out of me - it's just plain unacceptable.  With that said,
I
> think OA is wise to at least have the installer warn about service packs,
> based on things I've been told re NT.  I didn't use NT long enough to
claim
> experience of my own.

I wouldn't mind a warning about the service packs, but as it is it will not
even let me install it.  The Dolphin setup program won't give me enough rope
to hang myself even if I wanted to. ;)  A government entity once released a
program whose install file refused to permit an installation on Windows ME
because they had not tested it on that specific platform even though they
had tested it on Windows 98.  I went through a roundabout route and got it
working on Windows ME, and it ran fine as I suspected it would.  Talk about
afraid of the unknown. ;)  The only thing preventing it from running was
setup program.

>> I really appreciate that you are looking into this issue.
>Likewise.  Thanks!!!

This whole 4 byte corruption business reminds me about a similar problem I
had in an old programming environment I used to use called Actor.  I had
problems with Strings infrequently but randomly converting themselves into
Rect (as in rectangle) objects.  It was driving me crazy because I was
getting reports from end users but could not track down the cause of the
error, and the developers of Actor had been out of business for years.  I
guess I can relate to OA's situation here.  I could not reproduce it in a
development environment for years.  Then by a stroke of luck I ran into one
in my development image.  I captured it and did an object autopsy.  I found
that it was just the object identity that was being swapped, and that the
original string bytes were contained in the Rect.  I assume the error was
caused in the VM.  My workaround was to have the Rect objects convert
themselves back to Strings by adding a few methods to Rect.  I could not
really test the fix myself since the occurrence was so rare, but I added
logging code and released the fix.  Sure enough I got reports back that it
logged and fixed the problem perfectly.  Sure it was a kludge, but it
worked!  Hopefully this Dolphin issue is not the revenge of the Rect coming
back to haunt me. ;)

Just incase there is a shortage of wild speculation about this problem here
are the things I am wondering:

1. STBPrefix is interesting because it has a 4 byte dword.  I wonder if this
could be trampling the class name, or overflowing its expected size.
2. Could something be causing WriteStream<<next:putAll:startingAt: to
overwrite the previous contents of the buffer?  A negative size perhaps?
3. Apparently object headers are 4 bytes.  I wonder if the junk could be an
object header.
4. Far out, but who knows: Wasn't there a math bug in Pentium chips a while
ago?  Could there be a rare math error going on with the bit shifting in the
STBProxy?

Ok enough speculation for now.

Chris


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Bill Schwab-2
Chris,

> Just incase there is a shortage of wild speculation about this problem
here
> are the things I am wondering:
>
> 1. STBPrefix is interesting because it has a 4 byte dword.  I wonder if
this
> could be trampling the class name, or overflowing its expected size.
> 2. Could something be causing WriteStream<<next:putAll:startingAt: to
> overwrite the previous contents of the buffer?  A negative size perhaps?
> 3. Apparently object headers are 4 bytes.  I wonder if the junk could be
an
> object header.

What about an integer value (address or something) that fits a DWORD but not
in a SmallInteger, or something similarly goofy that would show up randomly.
Either way, your focus on the prefix could easily turn out to be on target.


> 4. Far out, but who knows: Wasn't there a math bug in Pentium chips a
while
> ago?  Could there be a rare math error going on with the bit shifting in
the
> STBProxy?

Are you thinking of the floating point problem, or was there another one?


> Ok enough speculation for now.

Never :)  Maybe it has something to do with #writeClass:withPrefix: and the
way it handles reference vs. full details re the class.  On second thought,
maybe that is _more than enough_ speculation for now :)

Have a good one,

Bill

--
Wilhelm K. Schwab, Ph.D.
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Andy Bower
In reply to this post by Chris Uppal-3
Chris (Uppal),

> > FWIW, I seem to have around 5700 instances of Menu in my image now, and
> they
> > are holding on to some 2400 instances of ExternalHandle in their
'handle'
> > instvar.
>
> Saving and restoring the image still is holding on to those resources;
they
> seem to be being kept alive by the input manager's window list:
>
>     is := SessionManager current inputState.
>     is windows size. --> 2708
>     is topLevelWindows size. --> 74
>     (is windows select: [:each | each asParameter notNil]) size. --> 2708
>
> Now to go clean them up...

A "panic" will do it. I think this problem (not disposing of CHB views
correctly) is caused by the problem that Ian reported a few days back in the
thread, "IDE extensions problem". So far I've only managed to get the STB
load/save loop to fail on the ClassBrowserShell view and with the "unable to
create window" error, not the corrupt class name. I will try this again
under NT today.

I think the first problem to patch is the one where CHBs are not being
disposed of properly.

Best Regards,

Andy Bower
Dolphin Support
http://www.object-arts.com
---
Are you trying too hard?
http://www.object-arts.com/Relax.htm
---


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Andy Bower
Chris,

Please disregard that previous message; it does fail to create windows with
views other than the CHB

Best Regards,

Andy Bower
Dolphin Support
http://www.object-arts.com
---
Are you trying too hard?
http://www.object-arts.com/Relax.htm
---


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Andy Bower
Folks,

> Please disregard that previous message; it does fail to create windows
with
> views other than the CHB

Doh!

The "failure to create window" problem and the resultant build up of views
is a problem with Christopher (Demers) original test. The views that are
being loaded are never destroyed, but just left hanging around being
invisible children of the desktop. You should alter the test to read:

errors := OrderedCollection new.
rcName := 'Default view'. rcClassName := ClassName.
100 timesRepeat: [
 [rc := (ResourceIdentifier class: rcClassName name: rcName) resource load.
 testView := ViewResource inSTBByteArray
  save: rc;
  loadWithContext: View desktop.
 rc close.
 testView destroy. "******"
 errors add: (TimeStamp current -> #ok)]
   on: Error do: [ :error | errors add: (TimeStamp current -> error)]].

(errors select: [ :ass | (ass value = #ok) not]) size.

Now to soak test this under NT.

Best Regards,

Andy Bower
Dolphin Support
http://www.object-arts.com
---
Are you trying too hard?
http://www.object-arts.com/Relax.htm
---


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Andy Bower
Chris,

> Now to soak test this under NT.

I have just finished a 2 hour run under NT4 SP6 on an Athlon 1800+ m/c with
no errors. I ran three processes, two saving/loading the CHB view as per
your example and one saving and loading a collection of all the classes in
the system. I had wondered whether it might be a process safety issue, hence
the three processes all doing STB work. I was using the release version of
D5.

I'm not sure where to go from here. I notice that the corrupt STB file you
sent me had a filename that indicated that it was generated on 30/1/2001
which is well back into the days of D4. Do you have an example from the
recent D5 failures, e.g. can you run your test again using a fresh D5 image
and NT and get the error handler to save out the file?

How many Dolphin processes are running in the image that gives the errors?
Does it fail in a fresh image? Is your NT box doing anything else at the
time (are there any unusual services running that we wouldn't have here).

Without being able to get it to fail here it is going to be tricky to go
much further. I'll leave it going over the weekend but I suspect it's not
going to fail.

(Interesting that you used Whitewater's Actor BTW. We looked at that a long
time ago in the days when it only barely squeezed into 640K of memory
alongside Windows 1 (or was it 2)).

Best Regards,

Andy Bower
Dolphin Support
http://www.object-arts.com
---
Are you trying too hard?
http://www.object-arts.com/Relax.htm
---


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Bill Schwab-2
Andy,

Again, thanks for investigating.  IIRC, Chris had indicated not seeing this
under D5.  If your test turns up nothing over the weekend (which I agree
seems likely), then it might be worth repeating your experiment in D4; I
suspect it's the same problem, so anything that makes it show itself is
likely worth doing.

I'm temporarily hindered by being unable to find my NT media - I know I have
it though.

Have a good one,

Bill

--
Wilhelm K. Schwab, Ph.D.
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Christopher J. Demers
In reply to this post by Andy Bower
Andy Bower <[hidden email]> wrote in message
news:ao60q1$jh5rh$[hidden email]...
> The "failure to create window" problem and the resultant build up of views
> is a problem with Christopher (Demers) original test. The views that are
> being loaded are never destroyed, but just left hanging around being
> invisible children of the desktop. You should alter the test to read:
...

I suppose I should have mentioned that since I don't use the image on NT
anymore I had not been snapshoting it and hence never noticed this.  Good
catch.  I hope my example code hasn't distracted too much from the root
issue.

Chris


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Christopher J. Demers
In reply to this post by Andy Bower
Andy Bower <[hidden email]> wrote in message
news:ao6esi$jfq9f$[hidden email]...
> I have just finished a 2 hour run under NT4 SP6 on an Athlon 1800+ m/c
with
> no errors. I ran three processes, two saving/loading the CHB view as per

I wonder if you have an older less advanced machine you could try?  The
machine that I can always make it happen on has a Pentium 300Mhz CPU and 128
MB RAM.

Bill: What kind of machine were you running that gave you the error?

> I'm not sure where to go from here. I notice that the corrupt STB file you
> sent me had a filename that indicated that it was generated on 30/1/2001
> which is well back into the days of D4. Do you have an example from the
> recent D5 failures, e.g. can you run your test again using a fresh D5
image
> and NT and get the error handler to save out the file?

I can't install D5 on my NT machine because I have SP4 rather than SP6.  The
D5 install program doesn't even let me install it.  I only run D5 on W2K and
I have never experienced the STB problem there, however I never experienced
it with D4 on that machine (Pentium Pro 200 Mhz with 128 MB ram) either.  If
I thought this problem would never happen in D5 I would not worry.  However
Bill has reported seeing the exact same kind of corruption in D5 on W2K that
I am seeing with D4 on NT.  If I understand Bill's posting it sounds like
the issue is rare with D5 on W2K.  Even if the corruption is rare I am just
worried that after I release my program I am going to get some calls from
end users whose files are corrupt.

I just made an EXE in D5 on my W2K machine that contains my STB stress code
(with Andy's change).  I do not get any errors on W2K or NT with the D5 EXE.
I will try running it over the weekend to stress it.

In a virgin D4 image I loaded a package with only one dialog class and a
view.  I was able to get 18 corruptions out of 100 tries.  Then I ran it a
few more times and got 2/100, 0/100,0/100, 10/100.  I waited between a few
seconds and a few minutes between the tests.

> How many Dolphin processes are running in the image that gives the errors?
> Does it fail in a fresh image? Is your NT box doing anything else at the
> time (are there any unusual services running that we wouldn't have here).

The process monitor shows only the standard 5 Dolphin processes.  I usually
have the following running: AOL IM, MS Outlook, MS Outlook Express (for
news), McAfee virus scanner and sometimes IE.  I am not running any servers
on the NT box.

> Without being able to get it to fail here it is going to be tricky to go
> much further. I'll leave it going over the weekend but I suspect it's not
> going to fail.

I understand.  I am not sure if you want to try D4 on NT in hopes of being
able to reproduce the problem and apply the fix to D5 since it seems to be
the same problem but at a lower frequency.  However I don't want you to
waste your time on this either.  It seems that something about D5 has
improved the issue over D4.  I guess the real test is to see what the real
world frequency of this issue is.  Perhaps this corruption will only occur
once every 10 years in D5, and Bill just got "lucky".  I guess the best
thing is for people to just report this here if it happens to them.  If Bill
or someone else runs into this again with D5 on W2K then I think it becomes
a more pressing issue.  I know at least three people have reported this in
D4 in the news group in the past.

If any wants to try here is a D4 package that includes a Dialog with a view
that I used for my tests above.  The example code is in the package comment
(it has Andy's fix) but I suggest not saving the image after the test or
running it in a trash image.
http://www.mitchellscientific.com/smalltalk/Dolphin4/4ByteSTBErrorExample.pa
c

> (Interesting that you used Whitewater's Actor BTW. We looked at that a
long
> time ago in the days when it only barely squeezed into 640K of memory
> alongside Windows 1 (or was it 2)).

Yep, I used it, and in fact I still support an application developed in it.
We are migrating that application to Dolphin.  The biggest problem with
Actor is the object pointer limitation (16 or 32 thousand, neither is big
enough).  It can get "fun" developing OO code when you have such a tight
object limitation.  I truly enjoy using Dolphin Smalltalk.  Thanks for
bringing it to life!

Chris


Reply | Threaded
Open this post in threaded view
|

Re: The four byte bug lives

Christopher J. Demers
In reply to this post by Bill Schwab-2
Bill Schwab <[hidden email]> wrote in message
news:ao4l4h$j74qt$[hidden email]...
...
> My most recent observation was fixed by rebooting the offending machine.
I
> was under some pressure to fix it, so I can't claim to have tried
everything
> along the way.  A previous encounter (perhaps on 2k sp1 though, and on a
> different machine) also seemed to require a reboot to fix.  Could the
> machines be suffering some kind of heap fragmentation causing more active
> memory management and therefore making the problem more likely to
appear???

I am interested that you say rebooting seemed to fix the problem.  It seems
that there are two layers to this problem.  The root problem is the
corruption of STB data.  However where you probably notice the problem is
when you read back from the STB data.  I assume that you either got rid of
the corrupt STB files or fixed them.  Were newly created STB files becoming
corrupt?

Are you experiencing a situation where the program runs fine, happily
writing to and reading from STB files for a period of time, and then starts
frequently trashing STB files (even after deleting previously damaged files
and restarting the program), and continues to do so until it is rebooted?
That sounds worse than I thought it was.  I thought you had just stumbled
upon one instance of STB corruption in D5 on W2K.  Is STB corruption on this
machine a reoccurring situation?

I asked this in my reply to Andy, but incase you don't see it:  What kind of
computer causes this problem under W2K?  Mine is a 300 mhz Pentium with 128
mb ram that causes trouble under NT.

Chris


1234