Source file index encoding rant

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Source file index encoding rant

Igor Stasenko
Hello,

encoding a file index in the highest bits of source pointer preventing
from having source file index > 32M :(

I mean, if we would encode pointer as:

(filePosition << 1) + (fileIndex -1) "file index 1 or 2 "
then there is no limit in .changes and .source files sizes, because
then its easy to represent such value as a byte array,
and it will mean only longer trailer bytes in compiled method, if
source file index above 32M.

But the way, how its encoded currently:

index * 16r1000000 + position

means that we're unable to have file position more than 32M :(

See implementation of #sourcePointerFromFileIndex:andPosition:

This should be fixed!

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Source file index encoding rant

Levente Uzonyi-2
On Sat, 12 Dec 2009, Igor Stasenko wrote:

> Hello,
>
> encoding a file index in the highest bits of source pointer preventing
> from having source file index > 32M :(
>
> I mean, if we would encode pointer as:
>
> (filePosition << 1) + (fileIndex -1) "file index 1 or 2 "
> then there is no limit in .changes and .source files sizes, because
> then its easy to represent such value as a byte array,
> and it will mean only longer trailer bytes in compiled method, if
> source file index above 32M.
>
> But the way, how its encoded currently:
>
> index * 16r1000000 + position
>
> means that we're unable to have file position more than 32M :(
>
> See implementation of #sourcePointerFromFileIndex:andPosition:
>
> This should be fixed!
>

+1

Also source file handling needs cleanup. I don't know anything about the
original design of the source file handling (and the decisions behind it),
but the current implementation has a few unused capabilities (more than 2
source files, multiple source file array implementation) and rarely used
features (in-memory source streams). Removing (some of) these could
simplify the implementation. Also it's a bit cryptic to use
SourcesFiles at: 2 instead of SourceFiles changesFile.


Levente

> --
> Best regards,
> Igor Stasenko AKA sig.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Source file index encoding rant

David F-2

On Dec 12, 2009, at 6:43 AM, Levente Uzonyi wrote:

> On Sat, 12 Dec 2009, Igor Stasenko wrote:
>
>> Hello,
>>
>> encoding a file index in the highest bits of source pointer  
>> preventing
>> from having source file index > 32M :(
>>
>> I mean, if we would encode pointer as:
>>
>> (filePosition << 1) + (fileIndex -1) "file index 1 or 2 "
>> then there is no limit in .changes and .source files sizes, because
>> then its easy to represent such value as a byte array,
>> and it will mean only longer trailer bytes in compiled method, if
>> source file index above 32M.
>>
>> But the way, how its encoded currently:
>>
>> index * 16r1000000 + position
>>
>> means that we're unable to have file position more than 32M :(
>>
>> See implementation of #sourcePointerFromFileIndex:andPosition:
>>
>> This should be fixed!
>>
>
> +1
>
> Also source file handling needs cleanup. I don't know anything about  
> the original design of the source file handling (and the decisions  
> behind it), but the current implementation has a few unused  
> capabilities (more than 2 source files, multiple source file array  
> implementation) and rarely used features (in-memory source streams).  
> Removing (some of) these could simplify the implementation. Also  
> it's a bit cryptic to use
> SourcesFiles at: 2 instead of SourceFiles changesFile.

+1

Also it would be very nice to move from carriage returns to line feeds  
to delimit lines in these files.  That way, when unix commands like  
grep happen to stumble over .changes, it doesn't output the entire  
file.  And, better yet, it would enable the use of standard unix  
commands on those files.

Also, I believe there is a bug where Squeak can't recover if it loses  
its filehandle to the changes file (e.g. if you are running Squeak off  
of a network-mapped drive and you temporarily loses that connection).

David


Reply | Threaded
Open this post in threaded view
|

Re: Source file index encoding rant

Igor Stasenko
2009/12/12 David F <[hidden email]>:

>
> On Dec 12, 2009, at 6:43 AM, Levente Uzonyi wrote:
>
>> On Sat, 12 Dec 2009, Igor Stasenko wrote:
>>
>>> Hello,
>>>
>>> encoding a file index in the highest bits of source pointer preventing
>>> from having source file index > 32M :(
>>>
>>> I mean, if we would encode pointer as:
>>>
>>> (filePosition << 1) + (fileIndex -1) "file index 1 or 2 "
>>> then there is no limit in .changes and .source files sizes, because
>>> then its easy to represent such value as a byte array,
>>> and it will mean only longer trailer bytes in compiled method, if
>>> source file index above 32M.
>>>
>>> But the way, how its encoded currently:
>>>
>>> index * 16r1000000 + position
>>>
>>> means that we're unable to have file position more than 32M :(
>>>
>>> See implementation of #sourcePointerFromFileIndex:andPosition:
>>>
>>> This should be fixed!
>>>
>>
>> +1
>>
>> Also source file handling needs cleanup. I don't know anything about the
>> original design of the source file handling (and the decisions behind it),
>> but the current implementation has a few unused capabilities (more than 2
>> source files, multiple source file array implementation) and rarely used
>> features (in-memory source streams). Removing (some of) these could simplify
>> the implementation. Also it's a bit cryptic to use
>> SourcesFiles at: 2 instead of SourceFiles changesFile.
>
> +1
>
> Also it would be very nice to move from carriage returns to line feeds to
> delimit lines in these files.  That way, when unix commands like grep happen
> to stumble over .changes, it doesn't output the entire file.  And, better
> yet, it would enable the use of standard unix commands on those files.
>
-1 here. We shouldn't care about idiosyncrasy of outside world more
than necessary.
The .changes and .sources files is meant to be used inside squeak.
They're not meant to be viewed or edited using external tools.

> Also, I believe there is a bug where Squeak can't recover if it loses its
> filehandle to the changes file (e.g. if you are running Squeak off of a
> network-mapped drive and you temporarily loses that connection).
>
That's would be nice.

> David
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Source file index encoding rant

Bert Freudenberg
Am 13.12.2009 um 13:12 schrieb Igor Stasenko:

>
> 2009/12/12 David F <[hidden email]>:
>> Also it would be very nice to move from carriage returns to line feeds to
>> delimit lines in these files.  That way, when unix commands like grep happen
>> to stumble over .changes, it doesn't output the entire file.  And, better
>> yet, it would enable the use of standard unix commands on those files.
>>
> -1 here. We shouldn't care about idiosyncrasy of outside world more
> than necessary.
> The .changes and .sources files is meant to be used inside squeak.
> They're not meant to be viewed or edited using external tools.

That was my first reaction too.

But OTOH it doesn't really matter to Squeak which line endings are used by default. I do find myself writing special little scripts to deal with CR files on any new machine I have to work on for a while. It would make it easier for unix people like me, while not making a difference to others who do not use unix shell tools anyway.

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: Source file index encoding rant

K. K. Subramaniam
On Sunday 13 December 2009 10:34:52 pm Bert Freudenberg wrote:
> But OTOH it doesn't really matter to Squeak which line endings are used by
>  default. I do find myself writing special little scripts to deal with CR
>  files on any new machine I have to work on for a while. It would make it
>  easier for unix people like me, while not making a difference to others
>  who do not use unix shell tools anyway.
Not just Unix. OS X and even Windows have switched to LF as EOL these days. It
matters when editing files on portable media in Workspace or reading large
fileouts. Filtering such files through 'tr \\r \\n' can be skipped if Squeak
could save files with LF EOL.

Subbu