filesize reporting 0 for very large files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

filesize reporting 0 for very large files

Chris Muller-3
I have a 3.2GB file, but Squeak reports its #fileSize as 0 (in the
DirectoryEntry).  It seems to occur for any file that large.

Is it an overflow condition in the VM?  Is it possible to fix?

Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

David T. Lewis
On Thu, Mar 15, 2012 at 10:35:47PM -0500, Chris Muller wrote:
> I have a 3.2GB file, but Squeak reports its #fileSize as 0 (in the
> DirectoryEntry).  It seems to occur for any file that large.
>
> Is it an overflow condition in the VM?  Is it possible to fix?

It works fine with an interpreter VM compiled in 64-bit mode. I
cannot look into it in detail now, but an educated guess is that
it relates to the definition of size_t for 32-bit programs, and
that for the kind of work you are doing here a 64-bit VM may be
in order.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

Andreas.Raab
On 3/16/2012 5:45, David T. Lewis wrote:

> On Thu, Mar 15, 2012 at 10:35:47PM -0500, Chris Muller wrote:
>> I have a 3.2GB file, but Squeak reports its #fileSize as 0 (in the
>> DirectoryEntry).  It seems to occur for any file that large.
>>
>> Is it an overflow condition in the VM?  Is it possible to fix?
>
> It works fine with an interpreter VM compiled in 64-bit mode. I
> cannot look into it in detail now, but an educated guess is that
> it relates to the definition of size_t for 32-bit programs, and
> that for the kind of work you are doing here a 64-bit VM may be
> in order.

That shouldn't be necessary. The VMs have long been updated to support
64bit file sizes and to the best of my knowledge this generally works.
If it doesn't, I would suspect a specific platform problem; it would
help to know what VM and what platform the problem occurs on.

Cheers,
   - Andreas


Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

Chris Muller-4
In reply to this post by David T. Lewis
Does a 32-bit VM mean that no individual primitives can have an
argument or return value greater than 32 bits?  File-size is something
that, in 2012 with HD video recording devices, can very easily exceed
32-bits so it would seem to be overkill for ALL objects to have to be
larger than 32 bits (as in a 64-bit VM) just for this one primitive..


On Thu, Mar 15, 2012 at 11:45 PM, David T. Lewis <[hidden email]> wrote:

> On Thu, Mar 15, 2012 at 10:35:47PM -0500, Chris Muller wrote:
>> I have a 3.2GB file, but Squeak reports its #fileSize as 0 (in the
>> DirectoryEntry).  It seems to occur for any file that large.
>>
>> Is it an overflow condition in the VM?  Is it possible to fix?
>
> It works fine with an interpreter VM compiled in 64-bit mode. I
> cannot look into it in detail now, but an educated guess is that
> it relates to the definition of size_t for 32-bit programs, and
> that for the kind of work you are doing here a 64-bit VM may be
> in order.
>
> Dave
>

Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

Bert Freudenberg
On 18.03.2012, at 17:10, Chris Muller wrote:

> Does a 32-bit VM mean that no individual primitives can have an
> argument or return value greater than 32 bits?

No. It just means that any OOP is 32 bits wide. The file size primitive returns a LargeInteger instance, but its OOP will be a 32 bit pointer.

>  File-size is something
> that, in 2012 with HD video recording devices, can very easily exceed
> 32-bits so it would seem to be overkill for ALL objects to have to be
> larger than 32 bits (as in a 64-bit VM) just for this one primitive..


As Andreas wrote, it works fine on Windows in a 32 bit VM. The problem must be in your platform's support code.

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

Chris Muller-3
It is on my home machine running a recent Cog VM on Ubuntu Linux
(10.04 I think).

By platform-support code, does that refer to the VM plugins or
something put out by Ubuntu?

Thanks..

On Sun, Mar 18, 2012 at 11:32 AM, Bert Freudenberg <[hidden email]> wrote:

> On 18.03.2012, at 17:10, Chris Muller wrote:
>
>> Does a 32-bit VM mean that no individual primitives can have an
>> argument or return value greater than 32 bits?
>
> No. It just means that any OOP is 32 bits wide. The file size primitive returns a LargeInteger instance, but its OOP will be a 32 bit pointer.
>
>>  File-size is something
>> that, in 2012 with HD video recording devices, can very easily exceed
>> 32-bits so it would seem to be overkill for ALL objects to have to be
>> larger than 32 bits (as in a 64-bit VM) just for this one primitive..
>
>
> As Andreas wrote, it works fine on Windows in a 32 bit VM. The problem must be in your platform's support code.
>
> - Bert -
>
>

Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

Bert Freudenberg

On 18.03.2012, at 18:13, Chris Muller wrote:

> It is on my home machine running a recent Cog VM on Ubuntu Linux
> (10.04 I think).
>
> By platform-support code, does that refer to the VM plugins or
> something put out by Ubuntu?
>
> Thanks..

The VM.

- Bert -




>
> On Sun, Mar 18, 2012 at 11:32 AM, Bert Freudenberg <[hidden email]> wrote:
>> On 18.03.2012, at 17:10, Chris Muller wrote:
>>
>>> Does a 32-bit VM mean that no individual primitives can have an
>>> argument or return value greater than 32 bits?
>>
>> No. It just means that any OOP is 32 bits wide. The file size primitive returns a LargeInteger instance, but its OOP will be a 32 bit pointer.
>>
>>>  File-size is something
>>> that, in 2012 with HD video recording devices, can very easily exceed
>>> 32-bits so it would seem to be overkill for ALL objects to have to be
>>> larger than 32 bits (as in a 64-bit VM) just for this one primitive..
>>
>>
>> As Andreas wrote, it works fine on Windows in a 32 bit VM. The problem must be in your platform's support code.
>>
>> - Bert -
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

Hans-Martin Mosner
In reply to this post by Bert Freudenberg
Am 18.03.2012 17:32, schrieb Bert Freudenberg:

> On 18.03.2012, at 17:10, Chris Muller wrote:
>
>> Does a 32-bit VM mean that no individual primitives can have an
>> argument or return value greater than 32 bits?
> No. It just means that any OOP is 32 bits wide. The file size primitive returns a LargeInteger instance, but its OOP will be a 32 bit pointer.
>
>>  File-size is something
>> that, in 2012 with HD video recording devices, can very easily exceed
>> 32-bits so it would seem to be overkill for ALL objects to have to be
>> larger than 32 bits (as in a 64-bit VM) just for this one primitive..
>
> As Andreas wrote, it works fine on Windows in a 32 bit VM. The problem must be in your platform's support code.
>
> - Bert -
>
>
>
The problem is in the linux support code.
I've reported it in Mantis quite some time ago: http://bugs.squeak.org/view.php?id=7522 but I don't have an easy
solution because it seemed to affect several places (including platform-independent code IIRC) so I hoped that soem VM
maintainer would have a look at it.

Cheers,
Hans-Martin

Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

Nicolai Hess-3


Am 18. März 2012 21:06 schrieb Nicolai Hess <[hidden email]>:
For the 32Bit  unix VM,
primitiveDirectoryLookup and primitiveFileSize doesn't work with files larger than
2^32 bits.
Possible workaround/fix.
Build with -D_FILE_OFFSET_BITS=64
and replace ftell,fseek in sqFilePluginBasicPrims with ftello,fseeko.
Now reading filesize for files larger than 2^32 bits works, but I don't know if
there are any side effects.

regards
Nicolai



Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

David T. Lewis
In reply to this post by Chris Muller-4
On a Linux system, the data types that represent file positions are 32 bits
in size when the program is compiled in 32-bit mode (using the -m32 compiler
option), and they are 64 bits in size when the program is compiled in 64-bit
mode. The relevant data types are off_t and size_t, and if you compile the
following in 32-bit mode (-m32) and compare the same program compiled in
64-bit mode, you can see the difference:

  #include <stdio.h>
  #include <sys/types.h>
  main() {
    printf("off_t is %d\n", sizeof(off_t));
    printf("size_t is %d\n", sizeof(size_t));
  }

The VM is a program like any other, so when compiled as a 32-bit application,
file operations will fail if sizes and offsets exceed the range of an
off_t data type.

If you are dealing with very large files, you will need to consider using
a VM compiled in 64-bit mode, or you will need to find some mechanism to
limit data file size.

Cog is currently limited to 32-bit mode, so changing VMs may not be
acceptable if performance is a key concern. An interpreter VM compiled
in 64-bit mode is entirely suitable for server applications, except that
FFI and certain plugins will not be available.

Most users do not require large address spaces or data files, and the VMs
in general circulation are compiled in 32-bit mode in order to provide
support for a wide range of plugins.

Dave


Sun, Mar 18, 2012 at 11:10:17AM -0500, Chris Muller wrote:

> Does a 32-bit VM mean that no individual primitives can have an
> argument or return value greater than 32 bits?  File-size is something
> that, in 2012 with HD video recording devices, can very easily exceed
> 32-bits so it would seem to be overkill for ALL objects to have to be
> larger than 32 bits (as in a 64-bit VM) just for this one primitive..
>
>
> On Thu, Mar 15, 2012 at 11:45 PM, David T. Lewis <[hidden email]> wrote:
> > On Thu, Mar 15, 2012 at 10:35:47PM -0500, Chris Muller wrote:
> >> I have a 3.2GB file, but Squeak reports its #fileSize as 0 (in the
> >> DirectoryEntry). ?It seems to occur for any file that large.
> >>
> >> Is it an overflow condition in the VM? ?Is it possible to fix?
> >
> > It works fine with an interpreter VM compiled in 64-bit mode. I
> > cannot look into it in detail now, but an educated guess is that
> > it relates to the definition of size_t for 32-bit programs, and
> > that for the kind of work you are doing here a 64-bit VM may be
> > in order.
> >
> > Dave
> >

Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

David T. Lewis
In reply to this post by Hans-Martin Mosner
On Sun, Mar 18, 2012 at 07:28:11PM +0100, Hans-Martin Mosner wrote:

> Am 18.03.2012 17:32, schrieb Bert Freudenberg:
> > On 18.03.2012, at 17:10, Chris Muller wrote:
> >
> >> Does a 32-bit VM mean that no individual primitives can have an
> >> argument or return value greater than 32 bits?
> > No. It just means that any OOP is 32 bits wide. The file size primitive returns a LargeInteger instance, but its OOP will be a 32 bit pointer.
> >
> >>  File-size is something
> >> that, in 2012 with HD video recording devices, can very easily exceed
> >> 32-bits so it would seem to be overkill for ALL objects to have to be
> >> larger than 32 bits (as in a 64-bit VM) just for this one primitive..
> >
> > As Andreas wrote, it works fine on Windows in a 32 bit VM. The problem must be in your platform's support code.
> >
> > - Bert -
> >
> >
> >
> The problem is in the linux support code.
> I've reported it in Mantis quite some time ago: http://bugs.squeak.org/view.php?id=7522 but I don't have an easy
> solution because it seemed to affect several places (including platform-independent code IIRC) so I hoped that soem VM
> maintainer would have a look at it.

Hans-Martin,

Thanks for reporting this on Mantis, it is very helpful for maintaining
information on problems like this. As you note in the bug report, the issue
relates to the stat() function. I do not know if there is an alternative
way to address the problem without using stat(), but I know that stat()
uses the defined data type off_t to refer to file positions, and this is
a 32-bit data type for a VM compiled in 32-bit mode (i.e. most VMs in
general circulation). FWIW, this does work fine if the VM is compiled
in 64-bit mode for 64-bit Linux systems.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

Eliot Miranda-2
In reply to this post by David T. Lewis
Hi David,

On Sun, Mar 18, 2012 at 6:30 PM, David T. Lewis <[hidden email]> wrote:
On a Linux system, the data types that represent file positions are 32 bits
in size when the program is compiled in 32-bit mode (using the -m32 compiler
option), and they are 64 bits in size when the program is compiled in 64-bit
mode. The relevant data types are off_t and size_t, and if you compile the
following in 32-bit mode (-m32) and compare the same program compiled in
64-bit mode, you can see the difference:

 #include <stdio.h>
 #include <sys/types.h>
 main() {
   printf("off_t is %d\n", sizeof(off_t));
   printf("size_t is %d\n", sizeof(size_t));
 }

Not quite.  One can modify this by defining something like _LARGEFILE64_SOURCE at compile time.  e.g. see _LARGEFILE_SOURCE _LARGEFILE64_SOURCE & _FILE_OFFSET_BITS in  http://www.delorie.com/gnu/docs/glibc/libc_13.html.  I'll check that the appropriate one is defined when building Cog asap.

 

The VM is a program like any other, so when compiled as a 32-bit application,
file operations will fail if sizes and offsets exceed the range of an
off_t data type.

If you are dealing with very large files, you will need to consider using
a VM compiled in 64-bit mode, or you will need to find some mechanism to
limit data file size.

Cog is currently limited to 32-bit mode, so changing VMs may not be
acceptable if performance is a key concern. An interpreter VM compiled
in 64-bit mode is entirely suitable for server applications, except that
FFI and certain plugins will not be available.

Most users do not require large address spaces or data files, and the VMs
in general circulation are compiled in 32-bit mode in order to provide
support for a wide range of plugins.

Dave


Sun, Mar 18, 2012 at 11:10:17AM -0500, Chris Muller wrote:
> Does a 32-bit VM mean that no individual primitives can have an
> argument or return value greater than 32 bits?  File-size is something
> that, in 2012 with HD video recording devices, can very easily exceed
> 32-bits so it would seem to be overkill for ALL objects to have to be
> larger than 32 bits (as in a 64-bit VM) just for this one primitive..
>
>
> On Thu, Mar 15, 2012 at 11:45 PM, David T. Lewis <[hidden email]> wrote:
> > On Thu, Mar 15, 2012 at 10:35:47PM -0500, Chris Muller wrote:
> >> I have a 3.2GB file, but Squeak reports its #fileSize as 0 (in the
> >> DirectoryEntry). ?It seems to occur for any file that large.
> >>
> >> Is it an overflow condition in the VM? ?Is it possible to fix?
> >
> > It works fine with an interpreter VM compiled in 64-bit mode. I
> > cannot look into it in detail now, but an educated guess is that
> > it relates to the definition of size_t for 32-bit programs, and
> > that for the kind of work you are doing here a 64-bit VM may be
> > in order.
> >
> > Dave
> >




--
best,
Eliot



Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

David T. Lewis
On Sun, Mar 18, 2012 at 06:55:10PM -0700, Eliot Miranda wrote:

> Hi David,
>
> On Sun, Mar 18, 2012 at 6:30 PM, David T. Lewis <[hidden email]> wrote:
>
> > On a Linux system, the data types that represent file positions are 32 bits
> > in size when the program is compiled in 32-bit mode (using the -m32
> > compiler
> > option), and they are 64 bits in size when the program is compiled in
> > 64-bit
> > mode. The relevant data types are off_t and size_t, and if you compile the
> > following in 32-bit mode (-m32) and compare the same program compiled in
> > 64-bit mode, you can see the difference:
> >
> >  #include <stdio.h>
> >  #include <sys/types.h>
> >  main() {
> >    printf("off_t is %d\n", sizeof(off_t));
> >    printf("size_t is %d\n", sizeof(size_t));
> >  }
> >
>
> Not quite.  One can modify this by defining something like
> _LARGEFILE64_SOURCE at compile time.  e.g.
> see _LARGEFILE_SOURCE _LARGEFILE64_SOURCE & _FILE_OFFSET_BITS in
> http://www.delorie.com/gnu/docs/glibc/libc_13.html.  I'll check that the
> appropriate one is defined when building Cog asap.

Excellent, thank you!

Dave


Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

David T. Lewis
On Sun, Mar 18, 2012 at 10:00:46PM -0400, David T. Lewis wrote:

> On Sun, Mar 18, 2012 at 06:55:10PM -0700, Eliot Miranda wrote:
> > Hi David,
> >
> > On Sun, Mar 18, 2012 at 6:30 PM, David T. Lewis <[hidden email]> wrote:
> >
> > > On a Linux system, the data types that represent file positions are 32 bits
> > > in size when the program is compiled in 32-bit mode (using the -m32
> > > compiler
> > > option), and they are 64 bits in size when the program is compiled in
> > > 64-bit
> > > mode. The relevant data types are off_t and size_t, and if you compile the
> > > following in 32-bit mode (-m32) and compare the same program compiled in
> > > 64-bit mode, you can see the difference:
> > >
> > >  #include <stdio.h>
> > >  #include <sys/types.h>
> > >  main() {
> > >    printf("off_t is %d\n", sizeof(off_t));
> > >    printf("size_t is %d\n", sizeof(size_t));
> > >  }
> > >
> >
> > Not quite.  One can modify this by defining something like
> > _LARGEFILE64_SOURCE at compile time.  e.g.
> > see _LARGEFILE_SOURCE _LARGEFILE64_SOURCE & _FILE_OFFSET_BITS in
> > http://www.delorie.com/gnu/docs/glibc/libc_13.html.  I'll check that the
> > appropriate one is defined when building Cog asap.
>
> Excellent, thank you!

(CC to vm-dev)

I note that Nicolai Hess has just added a note to the Mantis issue with
a similar tip and pointers to additional things that may need attention
in the support code.

http://bugs.squeak.org/view.php?id=7522

Dave


Reply | Threaded
Open this post in threaded view
|

Re: filesize reporting 0 for very large files

Bert Freudenberg

On 19.03.2012, at 03:11, David T. Lewis wrote:

> On Sun, Mar 18, 2012 at 10:00:46PM -0400, David T. Lewis wrote:
>> On Sun, Mar 18, 2012 at 06:55:10PM -0700, Eliot Miranda wrote:
>>> Hi David,
>>>
>>> On Sun, Mar 18, 2012 at 6:30 PM, David T. Lewis <[hidden email]> wrote:
>>>
>>>> On a Linux system, the data types that represent file positions are 32 bits
>>>> in size when the program is compiled in 32-bit mode (using the -m32
>>>> compiler
>>>> option), and they are 64 bits in size when the program is compiled in
>>>> 64-bit
>>>> mode. The relevant data types are off_t and size_t, and if you compile the
>>>> following in 32-bit mode (-m32) and compare the same program compiled in
>>>> 64-bit mode, you can see the difference:
>>>>
>>>> #include <stdio.h>
>>>> #include <sys/types.h>
>>>> main() {
>>>>   printf("off_t is %d\n", sizeof(off_t));
>>>>   printf("size_t is %d\n", sizeof(size_t));
>>>> }
>>>>
>>>
>>> Not quite.  One can modify this by defining something like
>>> _LARGEFILE64_SOURCE at compile time.  e.g.
>>> see _LARGEFILE_SOURCE _LARGEFILE64_SOURCE & _FILE_OFFSET_BITS in
>>> http://www.delorie.com/gnu/docs/glibc/libc_13.html.  I'll check that the
>>> appropriate one is defined when building Cog asap.
>>
>> Excellent, thank you!
>
> (CC to vm-dev)
>
> I note that Nicolai Hess has just added a note to the Mantis issue with
> a similar tip and pointers to additional things that may need attention
> in the support code.
>
> http://bugs.squeak.org/view.php?id=7522
>
> Dave

Found at http://www.suse.de/~aj/linux_lfs.html
=======================
For using LFS in user programs, the programs have to use the LFS API. This involves recompilation and changes of programs. The API is documented in the glibc manual (the libc info pages) which can be read with e.g. "info libc".
In a nutshell for using LFS you can choose either of the following:
        • Compile your programs with "gcc -D_FILE_OFFSET_BITS=64". This forces all file access calls to use the 64 bit variants. Several types change also, e.g. off_t becomes off64_t. It's therefore important to always use the correct types and to not use e.g. int instead of off_t. For portability with other platforms you should use getconf LFS_CFLAGS which will return -D_FILE_OFFSET_BITS=64 on Linux platforms but might return something else on e.g. Solaris. For linking, you should use the link flags that are reported via getconf LFS_LDFLAGS. On Linux systems, you do not need special link flags.
        • Define _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE. With these defines you can use the LFS functions like open64 directly.
        • Use the O_LARGEFILE flag with open to operate on large files.
A complete documentation of the feature test macros like _FILE_OFFSET_BITS and _LARGEFILE_SOURCE is in the glibc manual (run e.g. "info libc 'Feature Test Macros'").
The LFS API is also documented in the LFS standard which is available at http://ftp.sas.com/standards/large.file/x_open.20Mar96.html.
=======================


- Bert -