Trouble opening large files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Trouble opening large files

Evan Donahue
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan
Reply | Threaded
Open this post in threaded view
|

Re: Trouble opening large files

Tudor Girba-2
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru



On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan



--

"Every thing has its own flow"
Reply | Threaded
Open this post in threaded view
|

Re: Trouble opening large files

Evan Donahue
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru



On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan



--

"Every thing has its own flow"

Reply | Threaded
Open this post in threaded view
|

Re: Trouble opening large files

Nicolai Hess
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru



On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan



--

"Every thing has its own flow"



Which OS ?
Can you check with other programs if this file is readable at all?




Nicolai


Reply | Threaded
Open this post in threaded view
|

Re: Trouble opening large files

Evan Donahue
The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru



On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan



--

"Every thing has its own flow"



Which OS ?
Can you check with other programs if this file is readable at all?




Nicolai



Reply | Threaded
Open this post in threaded view
|

Re: Trouble opening large files

Nicolai Hess
There is a bug report on mantis for squeaks unix vm.
I think this applies to pharo too, although I don't know if this
bug is still valid on recent squeak vm.

2014-10-14 17:43 GMT+02:00 Evan Donahue <[hidden email]>:
The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru



On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan



--

"Every thing has its own flow"



Which OS ?
Can you check with other programs if this file is readable at all?




Nicolai




Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: Trouble opening large files

Eliot Miranda-2
Hi Nicolai,

On Tue, Oct 14, 2014 at 11:26 AM, Nicolai Hess <[hidden email]> wrote:
There is a bug report on mantis for squeaks unix vm.
I think this applies to pharo too, although I don't know if this
bug is still valid on recent squeak vm.

yes, one must compile with -D_FILE_OFFSET_BITS=64.  The Cog VMs are also built with -D_GNU_SOURCE.

Here's a line from a Squeak file list on the current Cog VM:

(2014.10.11 07:00:46 7,115,143,880) Formula1.2014.Round16.Russia.Qualifying.BBCOneHD.1080i.H264.English-wserhkzt.ts 

No 32-bit limit here.



2014-10-14 17:43 GMT+02:00 Evan Donahue <[hidden email]>:
The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru



On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan



--

"Every thing has its own flow"



Which OS ?
Can you check with other programs if this file is readable at all?




Nicolai










--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Trouble opening large files

Nicolai Hess
In reply to this post by Nicolai Hess
Rebuild pharo vm with D_FILE_OFFSET_BITS=64 option:
Reading list directories with (very) large files working now.



2014-10-14 20:26 GMT+02:00 Nicolai Hess <[hidden email]>:
There is a bug report on mantis for squeaks unix vm.
I think this applies to pharo too, although I don't know if this
bug is still valid on recent squeak vm.

2014-10-14 17:43 GMT+02:00 Evan Donahue <[hidden email]>:
The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru



On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan



--

"Every thing has its own flow"



Which OS ?
Can you check with other programs if this file is readable at all?




Nicolai





Reply | Threaded
Open this post in threaded view
|

Re: Trouble opening large files

Evan Donahue
64 bits! That makes sense. I don't generally work with large files either. I have been getting my vm from get.pharo.org, so I will need to get the source to build the vm in the first place. A quick survey over the last few hours has revealed a multitude of vms, projects, platforms, repositories, and versions that I, in my pharo ignorance, cannot differentiate. Could someone please point me to the source I should be using to build the vm for the pharo40 image I have been pulling of get.pharo.org?

Thank you,
Evan

On Tue, Oct 14, 2014 at 3:02 PM, Nicolai Hess <[hidden email]> wrote:
Rebuild pharo vm with D_FILE_OFFSET_BITS=64 option:
Reading list directories with (very) large files working now.



2014-10-14 20:26 GMT+02:00 Nicolai Hess <[hidden email]>:
There is a bug report on mantis for squeaks unix vm.
I think this applies to pharo too, although I don't know if this
bug is still valid on recent squeak vm.

2014-10-14 17:43 GMT+02:00 Evan Donahue <[hidden email]>:
The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru



On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan



--

"Every thing has its own flow"



Which OS ?
Can you check with other programs if this file is readable at all?




Nicolai






Reply | Threaded
Open this post in threaded view
|

Re: Trouble opening large files

Nicolai Hess
Easiest way: download the generated source from
http://files.pharo.org/vm/src/vm-unix-sources/blessed/

2014-10-14 22:35 GMT+02:00 Evan Donahue <[hidden email]>:
64 bits! That makes sense. I don't generally work with large files either. I have been getting my vm from get.pharo.org, so I will need to get the source to build the vm in the first place. A quick survey over the last few hours has revealed a multitude of vms, projects, platforms, repositories, and versions that I, in my pharo ignorance, cannot differentiate. Could someone please point me to the source I should be using to build the vm for the pharo40 image I have been pulling of get.pharo.org?

Thank you,
Evan

On Tue, Oct 14, 2014 at 3:02 PM, Nicolai Hess <[hidden email]> wrote:
Rebuild pharo vm with D_FILE_OFFSET_BITS=64 option:
Reading list directories with (very) large files working now.



2014-10-14 20:26 GMT+02:00 Nicolai Hess <[hidden email]>:
There is a bug report on mantis for squeaks unix vm.
I think this applies to pharo too, although I don't know if this
bug is still valid on recent squeak vm.

2014-10-14 17:43 GMT+02:00 Evan Donahue <[hidden email]>:
The OS is Arch Linux.

I can read the file with less.

The problem, insofar as I can trace it, seems to stem from this line in UnixStore:

Primitives lookupDirectory: encodedPath filename: encodedBasename

When I have my 57G file there this line returns nil. If I move the 57G file and create a small file with the same name, the same command successfully finds the file. I am not sure how large a file must be to cause this issue, but A 1.5G file works fine.

On Tue, Oct 14, 2014 at 8:17 AM, Nicolai Hess <[hidden email]> wrote:
2014-10-14 6:38 GMT+02:00 Evan Donahue <[hidden email]>:
Hi, thanks for the reply.

The response is the same: "MessageNotUnderstood: False>>humanReadableSIByteSize."

This happens both to print-it as well as to do-it-and-go. Running the command on the neighboring "wiki.torrent" torrent file yields the correct 54kb.

Thanks,
Evan

On Tue, Oct 14, 2014 at 12:24 AM, Tudor Girba <[hidden email]> wrote:
Hi,

If I understand correctly, the failure occurs while navigating in the "Items" presentation.

I cannot reproduce this problem because I do not have enough disk space for such a large file :). But, could you do the following and let me know what the outcome is:

'path/to/your/large/file.xml' asFileReference humanReadableSize

?

Cheers,
Doru



On Tue, Oct 14, 2014 at 2:27 AM, Evan Donahue <[hidden email]> wrote:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.

If I open FileSystem disk root in the playground and naigate, attempting to enter the folder containing the (57G) xml file fails with "MessageNotUnderstood: False>>humanReadableSIByteSize." Likewise if I get a FileReference with FileSystem disk root / 'path' / 'to' / 'file' then self exists returns false and the parser fails.

Am I doing something wrong? Should I be able to do this?

The version number is #40283

Thanks,
Evan



--

"Every thing has its own flow"



Which OS ?
Can you check with other programs if this file is readable at all?




Nicolai







Reply | Threaded
Open this post in threaded view
|

Re: Trouble opening large files

hernanmd
In reply to this post by Evan Donahue


2014-10-13 21:27 GMT-03:00 Evan Donahue <[hidden email]>:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.


Just for curiosity's sake, is there a reason why you don't query through a Sparql to dbpedia?

Cheers,

Hernán


Reply | Threaded
Open this post in threaded view
|

Re: Trouble opening large files

Evan Donahue
Certainly, thanks for the curiosity.

I am processing natural language statistics over the entire wiki corpus, not querying for specific entries. The information and entries aren't important so much as the raw quantity of words. A simple stream through a file on disk is all I need.

Thanks,
Evan

On Wed, Oct 15, 2014 at 12:49 AM, Hernán Morales Durand <[hidden email]> wrote:


2014-10-13 21:27 GMT-03:00 Evan Donahue <[hidden email]>:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.


Just for curiosity's sake, is there a reason why you don't query through a Sparql to dbpedia?

Cheers,

Hernán



Reply | Threaded
Open this post in threaded view
|

Re: [Bulk] Trouble opening large files

Yanni Chiu

On Oct 15, 2014, at 1:02 AM, Evan Donahue <[hidden email]> wrote:

>
> I am processing natural language statistics over the entire wiki corpus, not querying for specific entries. The information and entries aren't important so much as the raw quantity of words. A simple stream through a file on disk is all I need.

Maybe you can workaround the problem by writing a shell script that writes the file over a pipe or socket.

Then on the that Pharo side, connect to the pipe/socket, and read the data stream.


Reply | Threaded
Open this post in threaded view
|

Re: Trouble opening large files

hernanmd
In reply to this post by Evan Donahue
Cool project! Let me know how do you go.
Cheers,

Hernán

2014-10-15 2:02 GMT-03:00 Evan Donahue <[hidden email]>:
Certainly, thanks for the curiosity.

I am processing natural language statistics over the entire wiki corpus, not querying for specific entries. The information and entries aren't important so much as the raw quantity of words. A simple stream through a file on disk is all I need.

Thanks,
Evan

On Wed, Oct 15, 2014 at 12:49 AM, Hernán Morales Durand <[hidden email]> wrote:


2014-10-13 21:27 GMT-03:00 Evan Donahue <[hidden email]>:
Hello, I've run into some odd behavior and wanted to check whether I might be missing something:

I have downloaded a copy of the english wikipedia as an xml file and am hoping to (sax) parse it. However, I can't even seem to get pharo to recognize that the file exists.


Just for curiosity's sake, is there a reason why you don't query through a Sparql to dbpedia?

Cheers,

Hernán