Error in SortedCollection do:

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Error in SortedCollection do:

Ron Teitelbaum
Hi all,

I just ran into an error and was wondering if anyone has ever seen this
before.  I have a sorted collection with a computationally intensive sorting
block.  

It blew up on a nil.  Low res jpg attached.

It appears to have worked for the first 12 items:
        newCollection in SortedCollection>>collect: has 12 items in it.

The 13th item returned a nil.  This is because array on the sorted
collection starts with 16 nils.  firstIndex is set to 17.  

Notice on the jpg it says firstIndex = 17 but index is 13.  So I can only
conclude that since index is 13 and 12 other objects were processed
successfully that the array and first index changed during processing.

I am not sure that I can reproduce this error.  I've been running the same
code for quite a while without seeing it.  If I can I will post it.

Has anyone seen this before or does anyone know why this may have happened?

Thanks for your help!

Ron Teitelbaum
[hidden email]

http://bugs.impara.de/view.php?id=6030 



SortedCollectionError.jpg (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Error in SortedCollection do:

Prof. Andrew P. Black
I'm making a wild guess here, Ron:  are you ABSOLUTELY sure that your  
sortblock represents an ordering relation, that is, for EVERY pair of  
values a and b,  ((sortBlock value: a value: b)  AND  (sortBlock  
value: b value: a)) implies a = b?   If this is not the case, the  
sort will very likely run off the end of the legitimate values in the  
array, and into the nils.

        Andrew


On 12 Feb 2007, at 08:09, Ron Teitelbaum wrote:

> Hi all,
>
> I just ran into an error and was wondering if anyone has ever seen  
> this
> before.  I have a sorted collection with a computationally  
> intensive sorting
> block.
>
> It blew up on a nil.  Low res jpg attached.
>
> It appears to have worked for the first 12 items:
> newCollection in SortedCollection>>collect: has 12 items in it.
>
> The 13th item returned a nil.  This is because array on the sorted
> collection starts with 16 nils.  firstIndex is set to 17.
>
> Notice on the jpg it says firstIndex = 17 but index is 13.  So I  
> can only
> conclude that since index is 13 and 12 other objects were processed
> successfully that the array and first index changed during processing.
>
> I am not sure that I can reproduce this error.  I've been running  
> the same
> code for quite a while without seeing it.  If I can I will post it.
>
> Has anyone seen this before or does anyone know why this may have  
> happened?
>
> Thanks for your help!
>
> Ron Teitelbaum
> [hidden email]
>
> http://bugs.impara.de/view.php?id=6030
> <SortedCollectionError.jpg>
>


Reply | Threaded
Open this post in threaded view
|

RE: Error in SortedCollection do:

Ron Teitelbaum
Hi Andrew,

I'm not actually getting an error in the sorting, and if I take the exact
same collection and reevaluate it in the debugger it comes up fine.

Here is the sortBlock:

        ^[:a :b | (a nodeID bitXor: self networkID) < (b nodeID bitXor: self
networkID)]

The values are 256 bit integers.  self networkID is an instance variable on
the model that creates the sortBlock.

I am doing some threading (the error always happens in the forked thread)
but I'm pretty sure there is no way that I might be adding to the collection
at the same time that I'm evaluating it.  I thought maybe if the collection
grew during the do: that would explain it, but I'm pretty sure that is not
happening.  

The error was coming up more frequently with more activity.  I was able to
fix the problem as far as I can tell with aSortedCollection
asOrderedCollection do:.  I figured that would just push the problem to the
orderedCollection conversion but it seems to work just fine (no errors, no
nils in the collection and it appears that nothing is lost although the last
part is hard to prove).  I guessing that the extra message sends in
asOrderedCollection allow the sorted collection to finish its sort.  Also
this appears to only happen when there is intensive activity.  I'm running
SSL connections, with seaside and multiple threads.  

It does worry me that I may loose values during the asOrderedCollection
conversion.  I may take it out and put a semaphore around it to make sure
that it's not growing from an add during the read.  I've tried to reproduce
it with a simple case but so far no luck.

Thanks for your suggestion and for looking at it.  

Ron Teitelbaum

> -----Original Message-----
> From: Andrew P. Black
> Sent: Thursday, February 15, 2007 7:10 PM
>
> I'm making a wild guess here, Ron:  are you ABSOLUTELY sure that your
> sortblock represents an ordering relation, that is, for EVERY pair of
> values a and b,  ((sortBlock value: a value: b)  AND  (sortBlock
> value: b value: a)) implies a = b?   If this is not the case, the
> sort will very likely run off the end of the legitimate values in the
> array, and into the nils.
>
> Andrew
>
>
> On 12 Feb 2007, at 08:09, Ron Teitelbaum wrote:
>
> > Hi all,
> >
> > I just ran into an error and was wondering if anyone has ever seen
> > this
> > before.  I have a sorted collection with a computationally
> > intensive sorting
> > block.
> >
> > It blew up on a nil.  Low res jpg attached.
> >
> > It appears to have worked for the first 12 items:
> > newCollection in SortedCollection>>collect: has 12 items in it.
> >
> > The 13th item returned a nil.  This is because array on the sorted
> > collection starts with 16 nils.  firstIndex is set to 17.
> >
> > Notice on the jpg it says firstIndex = 17 but index is 13.  So I
> > can only
> > conclude that since index is 13 and 12 other objects were processed
> > successfully that the array and first index changed during processing.
> >
> > I am not sure that I can reproduce this error.  I've been running
> > the same
> > code for quite a while without seeing it.  If I can I will post it.
> >
> > Has anyone seen this before or does anyone know why this may have
> > happened?
> >
> > Thanks for your help!
> >
> > Ron Teitelbaum
> > [hidden email]
> >
> > http://bugs.impara.de/view.php?id=6030
> > <SortedCollectionError.jpg>
> >
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Error in SortedCollection do:

Andreas.Raab
You are describing all the classic symptoms of a race condition. Try
using a semaphore/mutex to protect your collection (and the nodes inside
it!).

Cheers,
   - Andreas

Ron Teitelbaum wrote:

> Hi Andrew,
>
> I'm not actually getting an error in the sorting, and if I take the exact
> same collection and reevaluate it in the debugger it comes up fine.
>
> Here is the sortBlock:
>
> ^[:a :b | (a nodeID bitXor: self networkID) < (b nodeID bitXor: self
> networkID)]
>
> The values are 256 bit integers.  self networkID is an instance variable on
> the model that creates the sortBlock.
>
> I am doing some threading (the error always happens in the forked thread)
> but I'm pretty sure there is no way that I might be adding to the collection
> at the same time that I'm evaluating it.  I thought maybe if the collection
> grew during the do: that would explain it, but I'm pretty sure that is not
> happening.  
>
> The error was coming up more frequently with more activity.  I was able to
> fix the problem as far as I can tell with aSortedCollection
> asOrderedCollection do:.  I figured that would just push the problem to the
> orderedCollection conversion but it seems to work just fine (no errors, no
> nils in the collection and it appears that nothing is lost although the last
> part is hard to prove).  I guessing that the extra message sends in
> asOrderedCollection allow the sorted collection to finish its sort.  Also
> this appears to only happen when there is intensive activity.  I'm running
> SSL connections, with seaside and multiple threads.  
>
> It does worry me that I may loose values during the asOrderedCollection
> conversion.  I may take it out and put a semaphore around it to make sure
> that it's not growing from an add during the read.  I've tried to reproduce
> it with a simple case but so far no luck.
>
> Thanks for your suggestion and for looking at it.  
>
> Ron Teitelbaum
>
>> -----Original Message-----
>> From: Andrew P. Black
>> Sent: Thursday, February 15, 2007 7:10 PM
>>
>> I'm making a wild guess here, Ron:  are you ABSOLUTELY sure that your
>> sortblock represents an ordering relation, that is, for EVERY pair of
>> values a and b,  ((sortBlock value: a value: b)  AND  (sortBlock
>> value: b value: a)) implies a = b?   If this is not the case, the
>> sort will very likely run off the end of the legitimate values in the
>> array, and into the nils.
>>
>> Andrew
>>
>>
>> On 12 Feb 2007, at 08:09, Ron Teitelbaum wrote:
>>
>>> Hi all,
>>>
>>> I just ran into an error and was wondering if anyone has ever seen
>>> this
>>> before.  I have a sorted collection with a computationally
>>> intensive sorting
>>> block.
>>>
>>> It blew up on a nil.  Low res jpg attached.
>>>
>>> It appears to have worked for the first 12 items:
>>> newCollection in SortedCollection>>collect: has 12 items in it.
>>>
>>> The 13th item returned a nil.  This is because array on the sorted
>>> collection starts with 16 nils.  firstIndex is set to 17.
>>>
>>> Notice on the jpg it says firstIndex = 17 but index is 13.  So I
>>> can only
>>> conclude that since index is 13 and 12 other objects were processed
>>> successfully that the array and first index changed during processing.
>>>
>>> I am not sure that I can reproduce this error.  I've been running
>>> the same
>>> code for quite a while without seeing it.  If I can I will post it.
>>>
>>> Has anyone seen this before or does anyone know why this may have
>>> happened?
>>>
>>> Thanks for your help!
>>>
>>> Ron Teitelbaum
>>> [hidden email]
>>>
>>> http://bugs.impara.de/view.php?id=6030
>>> <SortedCollectionError.jpg>
>>>
>>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

RE: Error in SortedCollection do:

Ron Teitelbaum
Andreas,

Thanks I will.  I'd already created a SharedQueueSorted for something
similar.  I'll replace my sortedCollection with it and see if that fixes
things.  I really didn't want the extra overhead for this piece, oh well.

Any news about QWAQ you can share yet without a NDA?  Nice mention in that
CIO article
http://weeklysqueak.wordpress.com/2007/02/15/cio-insight-interview-with-alan
-kay/ .

Take care,

Ron

> -----Original Message-----
> From: [hidden email] [mailto:squeak-dev-
> [hidden email]] On Behalf Of Andreas Raab
> Sent: Thursday, February 15, 2007 8:08 PM
> To: [hidden email]; The general-purpose Squeak developers list
> Subject: Re: Error in SortedCollection do:
>
> You are describing all the classic symptoms of a race condition. Try
> using a semaphore/mutex to protect your collection (and the nodes inside
> it!).
>
> Cheers,
>    - Andreas
>
> Ron Teitelbaum wrote:
> > Hi Andrew,
> >
> > I'm not actually getting an error in the sorting, and if I take the
> exact
> > same collection and reevaluate it in the debugger it comes up fine.
> >
> > Here is the sortBlock:
> >
> > ^[:a :b | (a nodeID bitXor: self networkID) < (b nodeID bitXor: self
> > networkID)]
> >
> > The values are 256 bit integers.  self networkID is an instance variable
> on
> > the model that creates the sortBlock.
> >
> > I am doing some threading (the error always happens in the forked
> thread)
> > but I'm pretty sure there is no way that I might be adding to the
> collection
> > at the same time that I'm evaluating it.  I thought maybe if the
> collection
> > grew during the do: that would explain it, but I'm pretty sure that is
> not
> > happening.
> >
> > The error was coming up more frequently with more activity.  I was able
> to
> > fix the problem as far as I can tell with aSortedCollection
> > asOrderedCollection do:.  I figured that would just push the problem to
> the
> > orderedCollection conversion but it seems to work just fine (no errors,
> no
> > nils in the collection and it appears that nothing is lost although the
> last
> > part is hard to prove).  I guessing that the extra message sends in
> > asOrderedCollection allow the sorted collection to finish its sort.
> Also
> > this appears to only happen when there is intensive activity.  I'm
> running
> > SSL connections, with seaside and multiple threads.
> >
> > It does worry me that I may loose values during the asOrderedCollection
> > conversion.  I may take it out and put a semaphore around it to make
> sure
> > that it's not growing from an add during the read.  I've tried to
> reproduce
> > it with a simple case but so far no luck.
> >
> > Thanks for your suggestion and for looking at it.
> >
> > Ron Teitelbaum
> >
> >> -----Original Message-----
> >> From: Andrew P. Black
> >> Sent: Thursday, February 15, 2007 7:10 PM
> >>
> >> I'm making a wild guess here, Ron:  are you ABSOLUTELY sure that your
> >> sortblock represents an ordering relation, that is, for EVERY pair of
> >> values a and b,  ((sortBlock value: a value: b)  AND  (sortBlock
> >> value: b value: a)) implies a = b?   If this is not the case, the
> >> sort will very likely run off the end of the legitimate values in the
> >> array, and into the nils.
> >>
> >> Andrew
> >>
> >>
> >> On 12 Feb 2007, at 08:09, Ron Teitelbaum wrote:
> >>
> >>> Hi all,
> >>>
> >>> I just ran into an error and was wondering if anyone has ever seen
> >>> this
> >>> before.  I have a sorted collection with a computationally
> >>> intensive sorting
> >>> block.
> >>>
> >>> It blew up on a nil.  Low res jpg attached.
> >>>
> >>> It appears to have worked for the first 12 items:
> >>> newCollection in SortedCollection>>collect: has 12 items in it.
> >>>
> >>> The 13th item returned a nil.  This is because array on the sorted
> >>> collection starts with 16 nils.  firstIndex is set to 17.
> >>>
> >>> Notice on the jpg it says firstIndex = 17 but index is 13.  So I
> >>> can only
> >>> conclude that since index is 13 and 12 other objects were processed
> >>> successfully that the array and first index changed during processing.
> >>>
> >>> I am not sure that I can reproduce this error.  I've been running
> >>> the same
> >>> code for quite a while without seeing it.  If I can I will post it.
> >>>
> >>> Has anyone seen this before or does anyone know why this may have
> >>> happened?
> >>>
> >>> Thanks for your help!
> >>>
> >>> Ron Teitelbaum
> >>> [hidden email]
> >>>
> >>> http://bugs.impara.de/view.php?id=6030
> >>> <SortedCollectionError.jpg>
> >>>
> >>
> >
> >
> >
> >
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Error in SortedCollection do:

Prof. Andrew P. Black
In reply to this post by Ron Teitelbaum
Ron,

I bet Andreas is right.  I didn't notice in your original post that this code was concurrent.  Let us know!

On 15 Feb 2007, at 16:59, Ron Teitelbaum wrote:

^[:a :b | (a nodeID bitXor: self networkID) < (b nodeID bitXor: self

networkID)]


The values are 256 bit integers.  self networkID is an instance variable on

the model that creates the sortBlock.


On a different note: I inferred that you were concerned with speed.  If this is the case, wouldn't it make sense to do the self networkID once when the block is created, rather than 2 n log n times when you do the comparisons. 

Andrew

Andrew P. Black
Department of Computer Science
Portland State University
+1 503 725 2411





Reply | Threaded
Open this post in threaded view
|

RE: Error in SortedCollection do:

Ron Teitelbaum
________________________________________
From: Andrew P. Black
Sent: Friday, February 16, 2007 3:39 PM

On a different note: I inferred that you were concerned with speed. If this
is the case, wouldn't it make sense to do the self networkID once when the
block is created, rather than 2 n log n times when you do the comparisons.

        Andrew

That's interesting, I hadn't really considered it, since the block lives in
a model that creates the block but it is used somewhere else I figured that
the networkID would be a temp in the block.  But now that you point it out,
I can see that the call would be made to the model from the transferred
context.  That's a really silly mistake easily remedied by a temp variable.
Thanks for the suggestion.  

Ron