[vwnc] [Bug] UHE saving large image to network drive

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [Bug] UHE saving large image to network drive

Christian Haider
Great that you found a solution to the problem!
Does this mean that we will get a new VM which can be used with the 7.6 image or do we have to wait for the 7.7?

Cheers,
        Christian

> -----Ursprüngliche Nachricht-----
> Von: [hidden email]
> [mailto:[hidden email]] Im Auftrag von Valloud, Andres
> Gesendet: Dienstag, 19. August 2008 09:01
> An: [hidden email]
> Betreff: Re: [vwnc] [Bug] UHE saving large image to network drive
>
> Christian,
>
> We have just characterized the issue more precisely.  Writes
> over the network may fail when the write block is too large
> because then Windows fails to allocate enough memory to
> buffer the write.  Using unbuffered writes is not an option
> we would like to consider because, besides writes being
> slower when done locally, it has several additional
> restrictions which would complicate the implementation of the
> VM.  Instead, the interim solution we just verified to work
> is to recursively halve the write block until writes succeed,
> and to do this only when receiving error
> ERROR_NO_SYSTEM_RESOURCES (1450).
>
> Note that the fix described in the paragraph above, which we
> just tested and verified to work, has not yet gone through
> our verification process.
>
> This also shows that the image size is not the trigger of
> this problem.  Rather, what causes the issue to crop up is
> the size of memory segments allocated by the VM.  So, in
> other words, a 400mb image with 10mb old space segments would
> most likely save successfully, while a 200mb image with a
> 100mb object in it runs a higher risk of failing to save over
> a Windows network to a shared drive.
>
> Andres.
>
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Christian Haider
> Sent: Saturday, August 09, 2008 2:51 AM
> To: [hidden email]
> Subject: Re: [vwnc] [Bug] UHE saving large image to network drive
>
> I did some more experiments (without success though):
>
> - both client and server have an 1GB card, but the routers
> can only handle 100MB. So I connected the client directly
> with the server giving me a 1GB LAN connection (but I'm sure
> the cable cannot handle the full speed). - same error
> - used UNC filename instead of the mapped drive - same error
> - looked at the file handles as Eliot suggested (I'm not sure
> how though). TaskManager shows 86 Handles (I guess that are
> files and sockets). After doing 'IOAccessor closeAll' the
> number dropped from 86 to 79. - same error
> - closing all windows and doing several GCs - same error
> - saving to a 8.3 compatible filename (s:\image\a.im) - same error
> I would rule out a filename problem since parts of the
> file (around 118 MB of the 149 MB) were written before the
> error occured.
> - saved to a network path where the disk does not have a
> shadow copy - same error
> - closing all programs (Outlook and Firefox) (couldnt stopp
> AVG virus scanner though) - same error
> - saving to another client computer instead to the server - same error
> - saving from another WinXP client - same error
> - starting and saving locally on the server (win2003) - works
> - starting on the server and saving across the LAN - same error
> - Assert and debug VMs dont show any problems.
>
> Here is a script for a reproducible testcase (important is
> the network path!:
>
> ObjectMemory globalGarbageCollect.
> ((1 to: 1650) collect: [:a | (1 to: 10000) collect: [:b |
> b]]) inspect.
> ObjectMemory snapshotAs: 's:\image\a1650' thenQuit: false
>
> DoIt in the workspace of the vanilla visual.im. For me the
> error occurs with 1650 and above. 1649 and below works fine.
> The saved image size with 1649 is 77880 KB.
> It would be interesting if others can reproduce the error
> (probably with a different number).
>
> Preliminary conclusions:
> - it has to do with the LAN
> - it has to do with the image size (not content, I think)
> - it might be a problem of windows (XP and 2003)
> - the problem seem to be network writes of the VM
>
> Hth,
> Christian
>
> > -----Ursprüngliche Nachricht-----
> > Von: [hidden email]
> > [mailto:[hidden email]] Im Auftrag von Steven Kelly
> > Gesendet: Freitag, 8. August 2008 11:01
> > An: [hidden email]
> > Betreff: Re: [vwnc] [Bug] UHE saving large image to network drive
> >
> > Paul Baumann wrote:
> > >
> >
> '\\rv-bp-iffs-01\migrate_server\Smalltalk\builds\12.200\12.200 053.0\'
> > > asFilename withSeparator.
> > >
> > > a FATFilename('\\rv-bp-iffs-
> > > 01\migrate_server\smalltal\builds\12.200\12200053.0\')
> >
> > I've seen this problem occasionally over the years, when
> working with
> > any network file (maybe only UNCs, not mapped drives). VW doesn't
> > recognize the UNC network disk as NTFS and reverts to
> FATFilename and
> > truncates the directory and file parts to 8.3. Most problems
> > disappeared when FATFilename was corrected (5i.*?) to
> default to long
> > (254 character) filename components when it couldn't get
> the correct
> > result from the OS call in getFileSystemAttributes:. Still,
> > occasionally we see the error: maybe getFileSystemAttributes:
> > in the VM still has some old default code? Line 1081 in 7.6
> ntfile.c
> > seems to set the file type to FAT and maximum file
> component length to
> > 12 (8.3) if GetVolumeInformation fails but the drive is not
> invalid,
> > and then returns the result to the image as a valid set of
> file system
> > attributes.
> > Presumably that's what's happening here.
> >
> > HTH,
> > Steve
> >
> > > -----Original Message-----
> > > From: [hidden email]
> [mailto:[hidden email]] On
> > > Behalf Of Valloud, Andres
> > > Sent: Thursday, August 07, 2008 2:45 AM
> > > To: [hidden email]
> > > Subject: Re: [vwnc] [Bug] UHE saving large image to network drive
> > >
> > > Christian,
> > >
> > > Initial examination reveals that indeed, as Paul Baumann
> commented,
> > > this is an issue of Windows itself running out of
> > resources.  I am not
> > > sure how the VM should be instructed to cope with the OS
> telling it
> > > there is no way to perform what it has been requested.  To
> > some extend
> > > I'd be inclined to think that a retrying strategy could work, but
> > > unfortunately error 1450 does not seem to necessarily
> > indicate that the
> > > error is transient.  In other words, at least in principle
> > there could
> > > be a valid reason for a 1450 error, and retrying might not
> > necessarily
> > > fix it nor be a reasonable course of action.
> > >
> > > >From the material I reviewed, at first sight I think that what is
> > > going on in your case is that the client box floods the 100mbps
> > > ethernet connection faster than it can write data to the server,
> > > eventually the client cannot allocate more memory for
> buffering the
> > > write, and then a 1450 error occurs.
> > >
> > > If you save an image with the task manager open, would you mind
> > > checking what are the values related to the Physical
> Memory and the
> > > Kernel Memory at the point when the VM reports a failure?
> > You may want
> > > to take a screenshot of the task manager at the point when
> > the debugger
> > > comes up so it's easier to take a look at things.  Feel
> > free to send it
> > > directly to me.
> > >
> > > Also, are you booting the client Windows with the /3GB
> switch?  This
> > > has the effect of limiting the address space of the OS to
> > 1GB, and in
> > > turn can cause errors like 1450 to occur more frequently.
> > >
> > > Apparently, a way to avoid this is to request that
> Windows does not
> > > buffer writes to the image file as they happen.  However,
> this will
> > > almost certainly have the side effect of making image writes
> > > considerably slower.  For example, without buffering,
> writes from a
> > > client over to a server will block the image saving process
> > until the
> > > server acknowledges that the write was committed to disk.  
> > This implies
> > > a network round trip per file write operation.  Even
> locally, this
> > > would potentially make writing image files take
> > considerably more time.
> > >
> > > Note that we have not yet decided exactly what to do with
> > this, and we
> > > are not done with our research yet either.
> > >
> > > Andres.
> > >
> > >
> > > -----Original Message-----
> > > From: Christian Haider
> > [mailto:[hidden email]]
> > > Sent: Wednesday, August 06, 2008 10:46 PM
> > > To: Valloud, Andres; [hidden email]
> > > Subject: AW: [vwnc] [Bug] UHE saving large image to network drive
> > >
> > > Thanks for looking into it.
> > >
> > > The system error looks like this:
> > > self:  a SystemError(#'io error',1450)
> > > parameter:  1450
> > > name:  #'io error'
> > >
> > > HTH,
> > >         Christian
> > >
> > > > -----Ursprüngliche Nachricht-----
> > > > Von: [hidden email]
> > > > [mailto:[hidden email]] Im Auftrag von Valloud, Andres
> > > > Gesendet: Donnerstag, 7. August 2008 05:30
> > > > An: [hidden email]
> > > > Betreff: Re: [vwnc] [Bug] UHE saving large image to
> network drive
> > > >
> > > > Christian,
> > > >
> > > > In order to debug this issue, we need to know what is
> the numeric
> > > > error that the VM reports.  From the stack you posted, the
> > > interesting
> > > > section to look at is this:
> > > >
> > > > SystemError>>handleErrorFor:
> > > > NTFSFilename(Filename)>>snapshot:
> > > >
> > > > In particular, the systemError object will have the
> > numeric error in
> > > > the instance variable called parameter.  Would you mind
> letting us
> > > > know what is the value of the systemError's parameter?
> > > >
> > > > Andres.
> > > >
> > > > ________________________________
> > > >
> > > > From: [hidden email]
> > > > [mailto:[hidden email]] On Behalf Of Christian Haider
> > > > Sent: Saturday, August 02, 2008 12:57 AM
> > > > To: vwnc List
> > > > Subject: [vwnc] [Bug] UHE saving large image to network drive
> > > >
> > > >
> > > > Usually, I work with my images on a network drive.
> > > > Unfortunately, when the image gets too large (only 105 or
> > 141 MB), I
> > > > get a UHE when trying to save the image (see attached).
> > > >
> > > > The exception reliably occurs in VW7.5 and 7.6.
> > > > Client WinXP sp3 connected with 100MB LAN to directory on
> > Win Server
> > > > 2003 with NTFS (no compression).
> > > >
> > > > "save image as..." and "save image" to a local drive
> > works flawlessly.
> > > >
> > > > The reason seems to be an IO error which is not
> adequately handled
> > > (by
> > > > the VM?).
> > > >
> > > > cheers,
> > > >     Christian
> > > >
> > > >
> > > > No virus found in this incoming message.
> > > > Checked by AVG - http://www.avg.com
> > > > Version: 8.0.138 / Virus Database: 270.5.12/1595 - Release
> > > > Date: 06.08.2008 08:23
> > > >
> > > No virus found in this outgoing message.
> > > Checked by AVG - http://www.avg.com
> > > Version: 8.0.138 / Virus Database: 270.5.12/1595 - Release Date:
> > > 06.08.2008 08:23
> > >
> > > _______________________________________________
> > > vwnc mailing list
> > > [hidden email]
> > > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
> > >
> > >
> > > This message may contain confidential information and is
> > intended for
> > > specific recipients unless explicitly noted otherwise. If
> you have
> > > reason to believe you are not an intended recipient of
> this message,
> > > please delete it and notify the sender. This message may
> > not represent
> > > the opinion of IntercontinentalExchange, Inc. (ICE), its
> > subsidiaries
> > > or affiliates, and does not constitute a contract or guarantee.
> > > Unencrypted electronic mail is not secure and the
> recipient of this
> > > message is expected to provide safeguards from viruses and pursue
> > > alternate means of communication where privacy or a binding
> > message is
> > > desired.
> > >
> > >
> > > _______________________________________________
> > > vwnc mailing list
> > > [hidden email]
> > > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
> >
> > _______________________________________________
> > vwnc mailing list
> > [hidden email]
> > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
> >
> >
> >
> > No virus found in this incoming message.
> > Checked by AVG - http://www.avg.com
> > Version: 8.0.138 / Virus Database: 270.5.12/1599 - Release
> > Date: 07.08.2008 20:49
> >
> No virus found in this outgoing message.
> Checked by AVG - http://www.avg.com
> Version: 8.0.138 / Virus Database: 270.5.12/1599 - Release
> Date: 07.08.2008 20:49
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>
>
No virus found in this outgoing message.
Checked by AVG - http://www.avg.com 
Version: 8.0.138 / Virus Database: 270.6.6/1625 - Release Date: 21.08.2008 06:04

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [Bug] UHE saving large image to network drive

Arden Thomas
This will be fixed in VWDev builds, and for VW7.7.

If any customer is being impacted by this in VW7.6, and needs earlier resolution, please contact me.

Christian, thanks for finding this issue!

Regards

Arden Thomas

Arden Thomas
Cincom Smalltalk Product Manager
845 296 0686

Cincom Smalltalk - It makes hard things easier, the impossible, possible

"Simplicity is the Ultimate Sophistication" - Leonardo Da Vinci

On Aug 21, 2008, at 11:51 AM, Christian Haider wrote:

Great that you found a solution to the problem!
Does this mean that we will get a new VM which can be used with the 7.6 image or do we have to wait for the 7.7?

Cheers,
Christian

-----Ursprüngliche Nachricht-----
Von: [hidden email]
[[hidden email]] Im Auftrag von Valloud, Andres
Gesendet: Dienstag, 19. August 2008 09:01
An: [hidden email]
Betreff: Re: [vwnc] [Bug] UHE saving large image to network drive

Christian,

We have just characterized the issue more precisely.  Writes
over the network may fail when the write block is too large
because then Windows fails to allocate enough memory to
buffer the write.  Using unbuffered writes is not an option
we would like to consider because, besides writes being
slower when done locally, it has several additional
restrictions which would complicate the implementation of the
VM.  Instead, the interim solution we just verified to work
is to recursively halve the write block until writes succeed,
and to do this only when receiving error
ERROR_NO_SYSTEM_RESOURCES (1450).

Note that the fix described in the paragraph above, which we
just tested and verified to work, has not yet gone through
our verification process.

This also shows that the image size is not the trigger of
this problem.  Rather, what causes the issue to crop up is
the size of memory segments allocated by the VM.  So, in
other words, a 400mb image with 10mb old space segments would
most likely save successfully, while a 200mb image with a
100mb object in it runs a higher risk of failing to save over
a Windows network to a shared drive.

Andres.


-----Original Message-----
From: [hidden email]
[[hidden email]] On Behalf Of Christian Haider
Sent: Saturday, August 09, 2008 2:51 AM
To: [hidden email]
Subject: Re: [vwnc] [Bug] UHE saving large image to network drive

I did some more experiments (without success though):

- both client and server have an 1GB card, but the routers
can only handle 100MB. So I connected the client directly
with the server giving me a 1GB LAN connection (but I'm sure
the cable cannot handle the full speed). - same error
- used UNC filename instead of the mapped drive - same error
- looked at the file handles as Eliot suggested (I'm not sure
how though). TaskManager shows 86 Handles (I guess that are
files and sockets). After doing 'IOAccessor closeAll' the
number dropped from 86 to 79. - same error
- closing all windows and doing several GCs - same error
- saving to a 8.3 compatible filename (s:\image\a.im) - same error
I would rule out a filename problem since parts of the
file (around 118 MB of the 149 MB) were written before the
error occured.
- saved to a network path where the disk does not have a
shadow copy - same error
- closing all programs (Outlook and Firefox) (couldnt stopp
AVG virus scanner though) - same error
- saving to another client computer instead to the server - same error
- saving from another WinXP client - same error
- starting and saving locally on the server (win2003) - works
- starting on the server and saving across the LAN - same error
- Assert and debug VMs dont show any problems.

Here is a script for a reproducible testcase (important is
the network path!:

ObjectMemory globalGarbageCollect.
((1 to: 1650) collect: [:a | (1 to: 10000) collect: [:b |
b]]) inspect.
ObjectMemory snapshotAs: 's:\image\a1650' thenQuit: false

DoIt in the workspace of the vanilla visual.im. For me the
error occurs with 1650 and above. 1649 and below works fine.
The saved image size with 1649 is 77880 KB.
It would be interesting if others can reproduce the error
(probably with a different number).

Preliminary conclusions:
- it has to do with the LAN
- it has to do with the image size (not content, I think)
- it might be a problem of windows (XP and 2003)
- the problem seem to be network writes of the VM

Hth,
Christian

-----Ursprüngliche Nachricht-----
Von: [hidden email]
[[hidden email]] Im Auftrag von Steven Kelly
Gesendet: Freitag, 8. August 2008 11:01
An: [hidden email]
Betreff: Re: [vwnc] [Bug] UHE saving large image to network drive

Paul Baumann wrote:


'\\rv-bp-iffs-01\migrate_server\Smalltalk\builds\12.200\12.200 053.0\'
asFilename withSeparator.

a FATFilename('\\rv-bp-iffs-
01\migrate_server\smalltal\builds\12.200\12200053.0\')

I've seen this problem occasionally over the years, when
working with
any network file (maybe only UNCs, not mapped drives). VW doesn't
recognize the UNC network disk as NTFS and reverts to
FATFilename and
truncates the directory and file parts to 8.3. Most problems
disappeared when FATFilename was corrected (5i.*?) to
default to long
(254 character) filename components when it couldn't get
the correct
result from the OS call in getFileSystemAttributes:. Still,
occasionally we see the error: maybe getFileSystemAttributes:
in the VM still has some old default code? Line 1081 in 7.6
ntfile.c
seems to set the file type to FAT and maximum file
component length to
12 (8.3) if GetVolumeInformation fails but the drive is not
invalid,
and then returns the result to the image as a valid set of
file system
attributes.
Presumably that's what's happening here.

HTH,
Steve

-----Original Message-----
From: [hidden email]
[[hidden email]] On
Behalf Of Valloud, Andres
Sent: Thursday, August 07, 2008 2:45 AM
To: [hidden email]
Subject: Re: [vwnc] [Bug] UHE saving large image to network drive

Christian,

Initial examination reveals that indeed, as Paul Baumann
commented,
this is an issue of Windows itself running out of
resources.  I am not
sure how the VM should be instructed to cope with the OS
telling it
there is no way to perform what it has been requested.  To
some extend
I'd be inclined to think that a retrying strategy could work, but
unfortunately error 1450 does not seem to necessarily
indicate that the
error is transient.  In other words, at least in principle
there could
be a valid reason for a 1450 error, and retrying might not
necessarily
fix it nor be a reasonable course of action.

From the material I reviewed, at first sight I think that what is
going on in your case is that the client box floods the 100mbps
ethernet connection faster than it can write data to the server,
eventually the client cannot allocate more memory for
buffering the
write, and then a 1450 error occurs.

If you save an image with the task manager open, would you mind
checking what are the values related to the Physical
Memory and the
Kernel Memory at the point when the VM reports a failure?
You may want
to take a screenshot of the task manager at the point when
the debugger
comes up so it's easier to take a look at things.  Feel
free to send it
directly to me.

Also, are you booting the client Windows with the /3GB
switch?  This
has the effect of limiting the address space of the OS to
1GB, and in
turn can cause errors like 1450 to occur more frequently.

Apparently, a way to avoid this is to request that
Windows does not
buffer writes to the image file as they happen.  However,
this will
almost certainly have the side effect of making image writes
considerably slower.  For example, without buffering,
writes from a
client over to a server will block the image saving process
until the
server acknowledges that the write was committed to disk.  
This implies
a network round trip per file write operation.  Even
locally, this
would potentially make writing image files take
considerably more time.

Note that we have not yet decided exactly what to do with
this, and we
are not done with our research yet either.

Andres.


-----Original Message-----
From: Christian Haider
[[hidden email]]
Sent: Wednesday, August 06, 2008 10:46 PM
To: Valloud, Andres; [hidden email]
Subject: AW: [vwnc] [Bug] UHE saving large image to network drive

Thanks for looking into it.

The system error looks like this:
self:  a SystemError(#'io error',1450)
parameter:  1450
name:  #'io error'

HTH,
       Christian

-----Ursprüngliche Nachricht-----
Von: [hidden email]
[[hidden email]] Im Auftrag von Valloud, Andres
Gesendet: Donnerstag, 7. August 2008 05:30
An: [hidden email]
Betreff: Re: [vwnc] [Bug] UHE saving large image to
network drive

Christian,

In order to debug this issue, we need to know what is
the numeric
error that the VM reports.  From the stack you posted, the
interesting
section to look at is this:

SystemError>>handleErrorFor:
NTFSFilename(Filename)>>snapshot:

In particular, the systemError object will have the
numeric error in
the instance variable called parameter.  Would you mind
letting us
know what is the value of the systemError's parameter?

Andres.

________________________________

From: [hidden email]
[[hidden email]] On Behalf Of Christian Haider
Sent: Saturday, August 02, 2008 12:57 AM
To: vwnc List
Subject: [vwnc] [Bug] UHE saving large image to network drive


Usually, I work with my images on a network drive.
Unfortunately, when the image gets too large (only 105 or
141 MB), I
get a UHE when trying to save the image (see attached).

The exception reliably occurs in VW7.5 and 7.6.
Client WinXP sp3 connected with 100MB LAN to directory on
Win Server
2003 with NTFS (no compression).

"save image as..." and "save image" to a local drive
works flawlessly.

The reason seems to be an IO error which is not
adequately handled
(by
the VM?).

cheers,
   Christian


No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.138 / Virus Database: 270.5.12/1595 - Release
Date: 06.08.2008 08:23

No virus found in this outgoing message.
Checked by AVG - http://www.avg.com
Version: 8.0.138 / Virus Database: 270.5.12/1595 - Release Date:
06.08.2008 08:23

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


This message may contain confidential information and is
intended for
specific recipients unless explicitly noted otherwise. If
you have
reason to believe you are not an intended recipient of
this message,
please delete it and notify the sender. This message may
not represent
the opinion of IntercontinentalExchange, Inc. (ICE), its
subsidiaries
or affiliates, and does not constitute a contract or guarantee.
Unencrypted electronic mail is not secure and the
recipient of this
message is expected to provide safeguards from viruses and pursue
alternate means of communication where privacy or a binding
message is
desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.138 / Virus Database: 270.5.12/1599 - Release
Date: 07.08.2008 20:49

No virus found in this outgoing message.
Checked by AVG - http://www.avg.com
Version: 8.0.138 / Virus Database: 270.5.12/1599 - Release
Date: 07.08.2008 20:49

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



No virus found in this outgoing message.
Checked by AVG - http://www.avg.com
Version: 8.0.138 / Virus Database: 270.6.6/1625 - Release Date: 21.08.2008 06:04

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc






_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
12