Smalltalk › Pharo › Pharo Smalltalk Users

Trouble opening large files

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

14 messages Options

Evan Donahue

Trouble opening large files

Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,

Evan

Tudor Girba-2

Re: Trouble opening large files

Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

Cheers,

Doru

On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:

Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan

www.tudorgirba.com

"Every thing has its own flow"

Evan Donahue

Re: Trouble opening large files

Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:

Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru

On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan

--
www.tudorgirba.com

"Every thing has its own flow"

Nicolai Hess

Re: Trouble opening large files

2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:

Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru

On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan

--
www.tudorgirba.com

"Every thing has its own flow"

Which OS ?

Can you check with other programs if this file is readable at all?

Nicolai

Evan Donahue

Re: Trouble opening large files

The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:

2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru

On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan

--
www.tudorgirba.com

"Every thing has its own flow"

Which OS ?
Can you check with other programs if this file is readable at all?

Nicolai

Nicolai Hess

Re: Trouble opening large files

There is a bug report on mantis for squeaks unix vm.

I think this applies to pharo too, although I don't know if this

bug is still valid on recent squeak vm.

http://bugs.squeak.org/view.php?id=7522

2014-10-14 17:43 GMT+02:00 Evan Donahue <[hidden email]>:

The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru

On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan

--
www.tudorgirba.com

"Every thing has its own flow"

Which OS ?
Can you check with other programs if this file is readable at all?

Nicolai

Eliot Miranda-2

Re: [squeak-dev] Re: Trouble opening large files

Hi Nicolai,

On Tue, Oct 14, 2014 at 11:26 AM, Nicolai Hess <[hidden email]> wrote:

There is a bug report on mantis for squeaks unix vm.
I think this applies to pharo too, although I don't know if this
bug is still valid on recent squeak vm.

http://bugs.squeak.org/view.php?id=7522

yes, one must compile with -D_FILE_OFFSET_BITS=64. The Cog VMs are also built with -D_GNU_SOURCE.

Here's a line from a Squeak file list on the current Cog VM:

(2014.10.11 07:00:46 7,115,143,880) Formula1.2014.Round16.Russia.Qualifying.BBCOneHD.1080i.H264.English-wserhkzt.ts

No 32-bit limit here.

2014-10-14 17:43 GMT+02:00 Evan Donahue <[hidden email]>:
The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru

On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan

--
www.tudorgirba.com

"Every thing has its own flow"

Which OS ?
Can you check with other programs if this file is readable at all?

Nicolai

--
best,

Eliot

Nicolai Hess

Re: Trouble opening large files

In reply to this post by Nicolai Hess

Rebuild pharo vm with D_FILE_OFFSET_BITS=64 option:

Reading list directories with (very) large files working now.

2014-10-14 20:26 GMT+02:00 Nicolai Hess <[hidden email]>:

There is a bug report on mantis for squeaks unix vm.
I think this applies to pharo too, although I don't know if this
bug is still valid on recent squeak vm.

http://bugs.squeak.org/view.php?id=7522

2014-10-14 17:43 GMT+02:00 Evan Donahue <[hidden email]>:
The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru

On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan

--
www.tudorgirba.com

"Every thing has its own flow"

Which OS ?
Can you check with other programs if this file is readable at all?

Nicolai

Evan Donahue

Re: Trouble opening large files

64 bits! That makes sense. I don't generally work with large files either. I have been getting my vm from get.pharo.org, so I will need to get the source to build the vm in the first place. A quick survey over the last few hours has revealed a multitude of vms, projects, platforms, repositories, and versions that I, in my pharo ignorance, cannot differentiate. Could someone please point me to the source I should be using to build the vm for the pharo40 image I have been pulling of get.pharo.org?

Thank you,
Evan

On Tue, Oct 14, 2014 at 3:02 PM, Nicolai Hess <[hidden email]> wrote:

Rebuild pharo vm with D_FILE_OFFSET_BITS=64 option:
Reading list directories with (very) large files working now.

2014-10-14 20:26 GMT+02:00 Nicolai Hess <[hidden email]>:
There is a bug report on mantis for squeaks unix vm.
I think this applies to pharo too, although I don't know if this
bug is still valid on recent squeak vm.

http://bugs.squeak.org/view.php?id=7522

2014-10-14 17:43 GMT+02:00 Evan Donahue <[hidden email]>:
The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru

On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan

--
www.tudorgirba.com

"Every thing has its own flow"

Which OS ?
Can you check with other programs if this file is readable at all?

Nicolai

Nicolai Hess

Re: Trouble opening large files

Easiest way: download the generated source from
http://files.pharo.org/vm/src/vm-unix-sources/blessed/

2014-10-14 22:35 GMT+02:00 Evan Donahue <[hidden email]>:

64 bits! That makes sense. I don't generally work with large files either. I have been getting my vm from get.pharo.org, so I will need to get the source to build the vm in the first place. A quick survey over the last few hours has revealed a multitude of vms, projects, platforms, repositories, and versions that I, in my pharo ignorance, cannot differentiate. Could someone please point me to the source I should be using to build the vm for the pharo40 image I have been pulling of get.pharo.org?

Thank you,
Evan

On Tue, Oct 14, 2014 at 3:02 PM, Nicolai Hess <[hidden email]> wrote:
Rebuild pharo vm with D_FILE_OFFSET_BITS=64 option:
Reading list directories with (very) large files working now.

2014-10-14 20:26 GMT+02:00 Nicolai Hess <[hidden email]>:
There is a bug report on mantis for squeaks unix vm.
I think this applies to pharo too, although I don't know if this
bug is still valid on recent squeak vm.

http://bugs.squeak.org/view.php?id=7522

2014-10-14 17:43 GMT+02:00 Evan Donahue <[hidden email]>:
The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru

On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan

--
www.tudorgirba.com

"Every thing has its own flow"

Which OS ?
Can you check with other programs if this file is readable at all?

Nicolai

hernanmd

Re: Trouble opening large files

In reply to this post by Evan Donahue

2014-10-13 21:27 GMT-03:00 Evan Donahue <[hidden email]>:

Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

Just for curiosity's sake, is there a reason why you don't query through a Sparql to dbpedia?

Cheers,

Hernán

Evan Donahue

Re: Trouble opening large files

Certainly, thanks for the curiosity.

I am processing natural language statistics over the entire wiki corpus, not querying for specific entries. The information and entries aren't important so much as the raw quantity of words. A simple stream through a file on disk is all I need.

Thanks,
Evan

On Wed, Oct 15, 2014 at 12:49 AM, Hernán Morales Durand <[hidden email]> wrote:

2014-10-13 21:27 GMT-03:00 Evan Donahue <[hidden email]>:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

Just for curiosity's sake, is there a reason why you don't query through a Sparql to dbpedia?

Cheers,

Hernán

Yanni Chiu

Re: [Bulk] Trouble opening large files

On Oct 15, 2014, at 1:02 AM, Evan Donahue <[hidden email]> wrote:

>
> I am processing natural language statistics over the entire wiki corpus, not querying for specific entries. The information and entries aren't important so much as the raw quantity of words. A simple stream through a file on disk is all I need.

Maybe you can workaround the problem by writing a shell script that writes the file over a pipe or socket.

Then on the that Pharo side, connect to the pipe/socket, and read the data stream.

hernanmd

Re: Trouble opening large files

In reply to this post by Evan Donahue

Cool project! Let me know how do you go.

Cheers,

Hernán

2014-10-15 2:02 GMT-03:00 Evan Donahue <[hidden email]>:

Certainly, thanks for the curiosity.

I am processing natural language statistics over the entire wiki corpus, not querying for specific entries. The information and entries aren't important so much as the raw quantity of words. A simple stream through a file on disk is all I need.

Thanks,
Evan

On Wed, Oct 15, 2014 at 12:49 AM, Hernán Morales Durand <[hidden email]> wrote:

2014-10-13 21:27 GMT-03:00 Evan Donahue <[hidden email]>:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

Just for curiosity's sake, is there a reason why you don't query through a Sparql to dbpedia?

Cheers,

Hernán