Strange InvalidDirectoryErrors on Unix

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange InvalidDirectoryErrors on Unix

Andreas.Raab
Hi Folks -

I had an odd effect today and I'm wondering if other people might have
seen something similar in the past. One of our Linux servers (which has
been running happily for a week) has an update process running which
every five minutes checks to see if there is new data on the disk and if
so, updates itself from it.

Today, I noticed that it stopped updating itself and investigating
showed that it hit repeatedly an "InvalidDirectoryError" when trying to
update itself. Trying to understand what was going on I went into the
system life via VNC and stepped through the update method and suddenly
it started working again! No more InvalidDirectoryErrors, updating went
fine.

I'm at a complete loss as what might have caused this error.
Unfortunately (since I was doing it on the life system) I couldn't
investigate the error condition closer but there is a possibility that
directory service had been used from different threads. Anyone having
any ideas? (the only thing I am certain is that attaching VNC was *not*
what it made it working again - I saw the error while I was in there
with VNC and it became working again after I stepped through the method).

Any help is greatly appreciated. So are similar experiences and
workarounds for the issue. The server itself is running Fedora Core 4.

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

RE: Strange InvalidDirectoryErrors on Unix

J J-6
What kind of filesystem is this on?  Are you accessing the directory via NFS
or any other indirect means?  What kind of modifications are happening to
the directory?

I have not seen this in Squeak, but it reminds me of what happens when
something causes an inode change underneath a running process.

Example:

On a unix/linux box you cd to /some/place/nice.  Then on another shell you
cd to /some/place and rm -rf nice.  Then copy a new "nice" directory from
somewhere else, so that the directory structure is as it was before.  Now
you go back to the original shell and ls, but you get the error ". no such
file or directory" or something similar.  Clearly the directory does exist,
but the problem is your shell is still referencing the previous inodes that
are no longer valid.  And I have even seen certain backup systems cause
this.

So in your case, maybe the VM or your code is holding a reference to the
directory some how and another process is causing an inode change in some
way, which causes the error.  When you run it in the debugger, it bypasses
the cache somehow causing it to work again.

Just a wild guess, but that's all I can think of.

Hope it helps,
Jason

>From: Andreas Raab <[hidden email]>
>Reply-To: The general-purpose Squeak developers
>list<[hidden email]>
>To: The general-purpose Squeak developers
>list<[hidden email]>, Squeak Virtual Machine
>Development Discussion<[hidden email]>
>Subject: Strange InvalidDirectoryErrors on Unix
>Date: Mon, 09 Apr 2007 22:30:32 -0700
>
>Hi Folks -
>
>I had an odd effect today and I'm wondering if other people might have seen
>something similar in the past. One of our Linux servers (which has been
>running happily for a week) has an update process running which every five
>minutes checks to see if there is new data on the disk and if so, updates
>itself from it.
>
>Today, I noticed that it stopped updating itself and investigating showed
>that it hit repeatedly an "InvalidDirectoryError" when trying to update
>itself. Trying to understand what was going on I went into the system life
>via VNC and stepped through the update method and suddenly it started
>working again! No more InvalidDirectoryErrors, updating went fine.
>
>I'm at a complete loss as what might have caused this error. Unfortunately
>(since I was doing it on the life system) I couldn't investigate the error
>condition closer but there is a possibility that directory service had been
>used from different threads. Anyone having any ideas? (the only thing I am
>certain is that attaching VNC was *not* what it made it working again - I
>saw the error while I was in there with VNC and it became working again
>after I stepped through the method).
>
>Any help is greatly appreciated. So are similar experiences and workarounds
>for the issue. The server itself is running Fedora Core 4.
>
>Cheers,
>   - Andreas
>

_________________________________________________________________
Get a FREE Web site, company branded e-mail and more from Microsoft Office
Live! http://clk.atdmt.com/MRT/go/mcrssaub0050001411mrt/direct/01/


Reply | Threaded
Open this post in threaded view
|

Re: Strange InvalidDirectoryErrors on Unix

Andreas.Raab
J J wrote:
> What kind of filesystem is this on?  Are you accessing the directory via
> NFS or any other indirect means?  What kind of modifications are
> happening to the directory?

It's an ext3 file system locally mounted. The modifications consist of
adding and removing both files and directories.

> I have not seen this in Squeak, but it reminds me of what happens when
> something causes an inode change underneath a running process.

Right, I've seen that too. But if you look at the FileDirectory code
you'll notice that the only thing it uses is a fully qualified path; it
does not have any internal state other than that. Because of this, the
VM ought to check every time you are querying, or if it caches it ought
to know when to recheck (most definitely upon failure).

Cheers,
   - Andreas

>
> Example:
>
> On a unix/linux box you cd to /some/place/nice.  Then on another shell
> you cd to /some/place and rm -rf nice.  Then copy a new "nice" directory
> from somewhere else, so that the directory structure is as it was
> before.  Now you go back to the original shell and ls, but you get the
> error ". no such file or directory" or something similar.  Clearly the
> directory does exist, but the problem is your shell is still referencing
> the previous inodes that are no longer valid.  And I have even seen
> certain backup systems cause this.
>
> So in your case, maybe the VM or your code is holding a reference to the
> directory some how and another process is causing an inode change in
> some way, which causes the error.  When you run it in the debugger, it
> bypasses the cache somehow causing it to work again.
>
> Just a wild guess, but that's all I can think of.
>
> Hope it helps,
> Jason
>> From: Andreas Raab <[hidden email]>
>> Reply-To: The general-purpose Squeak developers
>> list<[hidden email]>
>> To: The general-purpose Squeak developers
>> list<[hidden email]>, Squeak Virtual Machine
>> Development Discussion<[hidden email]>
>> Subject: Strange InvalidDirectoryErrors on Unix
>> Date: Mon, 09 Apr 2007 22:30:32 -0700
>>
>> Hi Folks -
>>
>> I had an odd effect today and I'm wondering if other people might have
>> seen something similar in the past. One of our Linux servers (which
>> has been running happily for a week) has an update process running
>> which every five minutes checks to see if there is new data on the
>> disk and if so, updates itself from it.
>>
>> Today, I noticed that it stopped updating itself and investigating
>> showed that it hit repeatedly an "InvalidDirectoryError" when trying
>> to update itself. Trying to understand what was going on I went into
>> the system life via VNC and stepped through the update method and
>> suddenly it started working again! No more InvalidDirectoryErrors,
>> updating went fine.
>>
>> I'm at a complete loss as what might have caused this error.
>> Unfortunately (since I was doing it on the life system) I couldn't
>> investigate the error condition closer but there is a possibility that
>> directory service had been used from different threads. Anyone having
>> any ideas? (the only thing I am certain is that attaching VNC was
>> *not* what it made it working again - I saw the error while I was in
>> there with VNC and it became working again after I stepped through the
>> method).
>>
>> Any help is greatly appreciated. So are similar experiences and
>> workarounds for the issue. The server itself is running Fedora Core 4.
>>
>> Cheers,
>>   - Andreas
>>
>
> _________________________________________________________________
> Get a FREE Web site, company branded e-mail and more from Microsoft
> Office Live! http://clk.atdmt.com/MRT/go/mcrssaub0050001411mrt/direct/01/
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Strange InvalidDirectoryErrors on Unix

tblanchard
In reply to this post by Andreas.Raab
I have a similar experience with an app that writes out a temporary  
file.  It always uses the same name for the file - occasionally it  
will begin to produce errors to the effect that it could not write  
the file.  The file is not locked or anything, but it fails just the  
same.  I haven't had any luck finding the cause because of its  
transient nature.

-Todd Blanchard


On Apr 9, 2007, at 10:30 PM, Andreas Raab wrote:

> Hi Folks -
>
> I had an odd effect today and I'm wondering if other people might  
> have seen something similar in the past. One of our Linux servers  
> (which has been running happily for a week) has an update process  
> running which every five minutes checks to see if there is new data  
> on the disk and if so, updates itself from it.
>
> Today, I noticed that it stopped updating itself and investigating  
> showed that it hit repeatedly an "InvalidDirectoryError" when  
> trying to update itself. Trying to understand what was going on I  
> went into the system life via VNC and stepped through the update  
> method and suddenly it started working again! No more  
> InvalidDirectoryErrors, updating went fine.
>
> I'm at a complete loss as what might have caused this error.  
> Unfortunately (since I was doing it on the life system) I couldn't  
> investigate the error condition closer but there is a possibility  
> that directory service had been used from different threads. Anyone  
> having any ideas? (the only thing I am certain is that attaching  
> VNC was *not* what it made it working again - I saw the error while  
> I was in there with VNC and it became working again after I stepped  
> through the method).
>
> Any help is greatly appreciated. So are similar experiences and  
> workarounds for the issue. The server itself is running Fedora Core 4.
>
> Cheers,
>   - Andreas
>


Reply | Threaded
Open this post in threaded view
|

Re: Strange InvalidDirectoryErrors on Unix

Andreas.Raab
In reply to this post by Andreas.Raab
Hi -

Turns out my analysis was a red herring - the InvalidDirectoryError I
received was actually a genuine problem which caused the whole
computation to abort before it even reached the updating part. I got
confused because I didn't realize that the problem might be caused
before it ever gets to the updating part and invoking the updater
manually would of course work.

Apologies for making a fuzz over nothing.

Cheers,
   - Andreas

Andreas Raab wrote:

> J J wrote:
>> What kind of filesystem is this on?  Are you accessing the directory
>> via NFS or any other indirect means?  What kind of modifications are
>> happening to the directory?
>
> It's an ext3 file system locally mounted. The modifications consist of
> adding and removing both files and directories.
>
>> I have not seen this in Squeak, but it reminds me of what happens when
>> something causes an inode change underneath a running process.
>
> Right, I've seen that too. But if you look at the FileDirectory code
> you'll notice that the only thing it uses is a fully qualified path; it
> does not have any internal state other than that. Because of this, the
> VM ought to check every time you are querying, or if it caches it ought
> to know when to recheck (most definitely upon failure).
>
> Cheers,
>   - Andreas
>
>>
>> Example:
>>
>> On a unix/linux box you cd to /some/place/nice.  Then on another shell
>> you cd to /some/place and rm -rf nice.  Then copy a new "nice"
>> directory from somewhere else, so that the directory structure is as
>> it was before.  Now you go back to the original shell and ls, but you
>> get the error ". no such file or directory" or something similar.  
>> Clearly the directory does exist, but the problem is your shell is
>> still referencing the previous inodes that are no longer valid.  And I
>> have even seen certain backup systems cause this.
>>
>> So in your case, maybe the VM or your code is holding a reference to
>> the directory some how and another process is causing an inode change
>> in some way, which causes the error.  When you run it in the debugger,
>> it bypasses the cache somehow causing it to work again.
>>
>> Just a wild guess, but that's all I can think of.
>>
>> Hope it helps,
>> Jason
>>> From: Andreas Raab <[hidden email]>
>>> Reply-To: The general-purpose Squeak developers
>>> list<[hidden email]>
>>> To: The general-purpose Squeak developers
>>> list<[hidden email]>, Squeak Virtual Machine
>>> Development Discussion<[hidden email]>
>>> Subject: Strange InvalidDirectoryErrors on Unix
>>> Date: Mon, 09 Apr 2007 22:30:32 -0700
>>>
>>> Hi Folks -
>>>
>>> I had an odd effect today and I'm wondering if other people might
>>> have seen something similar in the past. One of our Linux servers
>>> (which has been running happily for a week) has an update process
>>> running which every five minutes checks to see if there is new data
>>> on the disk and if so, updates itself from it.
>>>
>>> Today, I noticed that it stopped updating itself and investigating
>>> showed that it hit repeatedly an "InvalidDirectoryError" when trying
>>> to update itself. Trying to understand what was going on I went into
>>> the system life via VNC and stepped through the update method and
>>> suddenly it started working again! No more InvalidDirectoryErrors,
>>> updating went fine.
>>>
>>> I'm at a complete loss as what might have caused this error.
>>> Unfortunately (since I was doing it on the life system) I couldn't
>>> investigate the error condition closer but there is a possibility
>>> that directory service had been used from different threads. Anyone
>>> having any ideas? (the only thing I am certain is that attaching VNC
>>> was *not* what it made it working again - I saw the error while I was
>>> in there with VNC and it became working again after I stepped through
>>> the method).
>>>
>>> Any help is greatly appreciated. So are similar experiences and
>>> workarounds for the issue. The server itself is running Fedora Core 4.
>>>
>>> Cheers,
>>>   - Andreas
>>>
>>
>> _________________________________________________________________
>> Get a FREE Web site, company branded e-mail and more from Microsoft
>> Office Live! http://clk.atdmt.com/MRT/go/mcrssaub0050001411mrt/direct/01/
>>
>>
>>
>
>
>