Contributing to Pharo

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
104 messages Options
1 ... 3456
Reply | Threaded
Open this post in threaded view
|

Re: Contributing to Pharo

demarey
Hi Thierry,

Just some thoughts I wanted to share:

Le 3 févr. 2016 à 10:18, Thierry Goubier a écrit :

> I went through all the different possible file formats, class-based, package-based, method-based, log metadata and the like, and I concluded that:
>
> - the method based format is as good as any other. Even better since it has a spec (cypress).

I see cons that a class (or package) format would not have. One file per method approach leads to generate plenty of small files. In general, file systems do not like that:
- it may consumes a lot of space. I remember I had a Java/maven project with a lot of small files and I got to fill the inodes tables on my unix system.
- you generate long pathes. Long pathes are not user-friendly and some OS have restrictions on path length.

By adopting a file per method approach, you also increase the distance to get a common script format for Smalltalk. Here I mean a file where you could define classes, methods, and run arbitrary portions of Smalltalk code.

> - method based format allow for method-history queries on the git/vcs history (as well as class based / package based queries).
> - the tree structure on github or bitbucket is quite convenient (and browsable) to the point one could edit a package directly in it (I do when I need to do a quick fix).

but is a pain to navigate: too much click to effectively browse a method content.


I do not know what would be the best format but I think we need to take care to do not generate too much files / folders. File system and VCS will appreciate also.

Cheers,
Christophe
Reply | Threaded
Open this post in threaded view
|

Re: Contributing to Pharo

Thierry Goubier
Le 05/02/2016 11:33, Christophe Demarey a écrit :

> Hi Thierry,
>
> Just some thoughts I wanted to share:
>
> Le 3 févr. 2016 à 10:18, Thierry Goubier a écrit :
>
>> I went through all the different possible file formats,
>> class-based, package-based, method-based, log metadata and the
>> like, and I concluded that:
>>
>> - the method based format is as good as any other. Even better
>> since it has a spec (cypress).
>
> I see cons that a class (or package) format would not have. One file
> per method approach leads to generate plenty of small files. In
> general, file systems do not like that: - it may consumes a lot of
> space. I remember I had a Java/maven project with a lot of small
> files and I got to fill the inodes tables on my unix system. - you
> generate long pathes. Long pathes are not user-friendly and some OS
> have restrictions on path length.

The method based structure of filetree is very close to how code is
navigated in Smalltalk browsers: one method at a time, with a
package/class/protocol hierarchical layering on top. The one file per
class / one file per package is a reference to the base unit of C / C++.

And no OS in general use has path restrictions that matter. Ok, the
windows vm has issues, but this is a vm bug, not a filesystem issue.

> By adopting a file per method approach, you also increase the
> distance to get a common script format for Smalltalk. Here I mean a
> file where you could define classes, methods, and run arbitrary
> portions of Smalltalk code.

This format is called fileout, and already exist.

All you describe is also available in the FileTree/Cypress format and is
technically better specified.

>> - method based format allow for method-history queries on the
>> git/vcs history (as well as class based / package based queries). -
>> the tree structure on github or bitbucket is quite convenient (and
>> browsable) to the point one could edit a package directly in it (I
>> do when I need to do a quick fix).
>
> but is a pain to navigate: too much click to effectively browse a
> method content.

You must hate Nautilus, then, since this is Nautilus approach as well.
Just count the number of clicks you do in a Nautilus, and the number in
github.

If we remove the instance sub-directory and write instance-side methods
just below the class name in filetree, then you'll get the exact same
number of clicks to reach a method than in Nautilus.

Fun fact: if you do that with the Mac finder in NexT mode over a
filetree repository (miller columns), you'll see that it almost looks
like a Nautilus top panes.

> I do not know what would be the best format but I think we need to
> take care to do not generate too much files / folders. File system
> and VCS will appreciate also.

I'd say, overall, what we need to remember is that  we produce a lot
less lines of code than other languages, and that we shouldn't
over-optimize.

I'll probably look into optimising FileTree-like writing in the future;
I wasn't that good into planning for it and it shows in specific cases.

Thierry

Reply | Threaded
Open this post in threaded view
|

Re: Contributing to Pharo

demarey

Le 5 févr. 2016 à 14:33, Thierry Goubier a écrit :

> Le 05/02/2016 11:33, Christophe Demarey a écrit :
>> Hi Thierry,
>>
>> Just some thoughts I wanted to share:
>>
>> Le 3 févr. 2016 à 10:18, Thierry Goubier a écrit :
>>
>>> I went through all the different possible file formats,
>>> class-based, package-based, method-based, log metadata and the
>>> like, and I concluded that:
>>>
>>> - the method based format is as good as any other. Even better
>>> since it has a spec (cypress).
>>
>> I see cons that a class (or package) format would not have. One file
>> per method approach leads to generate plenty of small files. In
>> general, file systems do not like that: - it may consumes a lot of
>> space. I remember I had a Java/maven project with a lot of small
>> files and I got to fill the inodes tables on my unix system. - you
>> generate long pathes. Long pathes are not user-friendly and some OS
>> have restrictions on path length.
>
> The method based structure of filetree is very close to how code is navigated in Smalltalk browsers: one method at a time, with a package/class/protocol hierarchical layering on top. The one file per class / one file per package is a reference to the base unit of C / C++.

I fully agree with that.
As we have a lot (small) methods, we will have a lot of small files and some file-system does not like that. I remember huge slow-down be cause of that. It is good to keep that in mind.

> And no OS in general use has path restrictions that matter. Ok, the windows vm has issues, but this is a vm bug, not a filesystem issue.

Windows command-line has (had) this limitation.

>
>> By adopting a file per method approach, you also increase the
>> distance to get a common script format for Smalltalk. Here I mean a
>> file where you could define classes, methods, and run arbitrary
>> portions of Smalltalk code.
>
> This format is called fileout, and already exist.

I mean something like a python script: http://archive.stsci.edu/vo/python_examples.html

> All you describe is also available in the FileTree/Cypress format and is technically better specified.
>
>>> - method based format allow for method-history queries on the
>>> git/vcs history (as well as class based / package based queries). -
>>> the tree structure on github or bitbucket is quite convenient (and
>>> browsable) to the point one could edit a package directly in it (I
>>> do when I need to do a quick fix).
>>
>> but is a pain to navigate: too much click to effectively browse a
>> method content.
>
> You must hate Nautilus, then, since this is Nautilus approach as well. Just count the number of clicks you do in a Nautilus, and the number in github.

but we have spotter! (I just miss the exact search to not click and scroll too much)

> If we remove the instance sub-directory and write instance-side methods just below the class name in filetree, then you'll get the exact same number of clicks to reach a method than in Nautilus.

it would be a good idea

> Fun fact: if you do that with the Mac finder in NexT mode over a filetree repository (miller columns), you'll see that it almost looks like a Nautilus top panes.
>
>> I do not know what would be the best format but I think we need to
>> take care to do not generate too much files / folders. File system
>> and VCS will appreciate also.
>
> I'd say, overall, what we need to remember is that  we produce a lot less lines of code than other languages, and that we shouldn't over-optimize.
>
> I'll probably look into optimising FileTree-like writing in the future; I wasn't that good into planning for it and it shows in specific cases.

It is actually the problem: we generate a lot of small files.
I do not have numbers but I think it would be good to stress a bit a file system to see where we hit the barrier and compare with the pharo code base. From the git side, I'm not aware of a limitation regarding small files.




Reply | Threaded
Open this post in threaded view
|

Re: Contributing to Pharo

Thierry Goubier
Le 05/02/2016 14:55, Christophe Demarey a écrit :

>
> Le 5 févr. 2016 à 14:33, Thierry Goubier a écrit :
>
>> Le 05/02/2016 11:33, Christophe Demarey a écrit :
>>> Hi Thierry,
>>>
>>> Just some thoughts I wanted to share:
>>>
>>> Le 3 févr. 2016 à 10:18, Thierry Goubier a écrit :
>>>
>>>> I went through all the different possible file formats,
>>>> class-based, package-based, method-based, log metadata and the
>>>> like, and I concluded that:
>>>>
>>>> - the method based format is as good as any other. Even better
>>>> since it has a spec (cypress).
>>>
>>> I see cons that a class (or package) format would not have. One
>>> file per method approach leads to generate plenty of small files.
>>> In general, file systems do not like that: - it may consumes a
>>> lot of space. I remember I had a Java/maven project with a lot of
>>> small files and I got to fill the inodes tables on my unix
>>> system. - you generate long pathes. Long pathes are not
>>> user-friendly and some OS have restrictions on path length.
>>
>> The method based structure of filetree is very close to how code is
>> navigated in Smalltalk browsers: one method at a time, with a
>> package/class/protocol hierarchical layering on top. The one file
>> per class / one file per package is a reference to the base unit of
>> C / C++.
>
> I fully agree with that. As we have a lot (small) methods, we will
> have a lot of small files and some file-system does not like that. I
> remember huge slow-down be cause of that. It is good to keep that in
> mind.

The problem is linked with writing too many files. Because of a possible
uncertainty about the on-disk state, FileTree erase the complete package
directory then rewrites everything, letting the vcs decide what has
really changed. This is doubly slow, because it hits the filesystem and
the vcs.

I said to Dale I'd see into a diff based writer; it should improve
things a lot.

>> And no OS in general use has path restrictions that matter. Ok, the
>> windows vm has issues, but this is a vm bug, not a filesystem
>> issue.
>
> Windows command-line has (had) this limitation.

Good to know.

>>> By adopting a file per method approach, you also increase the
>>> distance to get a common script format for Smalltalk. Here I mean
>>> a file where you could define classes, methods, and run
>>> arbitrary portions of Smalltalk code.
>>
>> This format is called fileout, and already exist.
>
> I mean something like a python script:
> http://archive.stsci.edu/vo/python_examples.html

Not entirely keen in going that way. I prefer declarative formats for
storing packages. And I still think that the fileout format is that (a
sequence of scripts to execute, separated by !!).

>> All you describe is also available in the FileTree/Cypress format
>> and is technically better specified.
>>
>>>> - method based format allow for method-history queries on the
>>>> git/vcs history (as well as class based / package based
>>>> queries). - the tree structure on github or bitbucket is quite
>>>> convenient (and browsable) to the point one could edit a
>>>> package directly in it (I do when I need to do a quick fix).
>>>
>>> but is a pain to navigate: too much click to effectively browse
>>> a method content.
>>
>> You must hate Nautilus, then, since this is Nautilus approach as
>> well. Just count the number of clicks you do in a Nautilus, and the
>> number in github.
>
> but we have spotter! (I just miss the exact search to not click and
> scroll too much)

Then you want spotter on the web :)

>> If we remove the instance sub-directory and write instance-side
>> methods just below the class name in filetree, then you'll get the
>> exact same number of clicks to reach a method than in Nautilus.
>
> it would be a good idea

Why not.

>> Fun fact: if you do that with the Mac finder in NexT mode over a
>> filetree repository (miller columns), you'll see that it almost
>> looks like a Nautilus top panes.
>>
>>> I do not know what would be the best format but I think we need
>>> to take care to do not generate too much files / folders. File
>>> system and VCS will appreciate also.
>>
>> I'd say, overall, what we need to remember is that  we produce a
>> lot less lines of code than other languages, and that we shouldn't
>> over-optimize.
>>
>> I'll probably look into optimising FileTree-like writing in the
>> future; I wasn't that good into planning for it and it shows in
>> specific cases.
>
> It is actually the problem: we generate a lot of small files. I do
> not have numbers but I think it would be good to stress a bit a file
> system to see where we hit the barrier and compare with the pharo
> code base. From the git side, I'm not aware of a limitation regarding
> small files.

I'm sure the numbers are already available. And, as I said above, you
may be measuring FileTree implementation limitations and nothing related
to filesystem issues (or git issues).

Thierry

1 ... 3456