help wanted: normalising LF on tonel for Pharo project

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: help wanted: normalising LF on tonel for Pharo project

Thierry Goubier
Hi Ben,

Le 11/04/2018 à 16:37, Ben Coman a écrit :

>
>
> On 11 April 2018 at 05:05, Esteban Lorenzano <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi,
>
>     I’ve been wondering how to better fix the problem of having windows
>     and linux/macOS people contributing and the fact that files are
>     written in their native system format (crlf windows, lf for the rest
>     of the world).
>
>
>
>
>     I digged a bit and I found a couple a link that helped me (after
>     trying to understand the doc):
>     https://stackoverflow.com/questions/170961/whats-the-best-crlf-carriage-return-line-feed-handling-strategy-with-git
>     <https://stackoverflow.com/questions/170961/whats-the-best-crlf-carriage-return-line-feed-handling-strategy-with-git>
>
>     and it seems adding a .gitattributes file with this content:
>
>     # Auto detect text files and perform LF normalization
>     *text=auto
>
>
> I see a few posts around that recommend reading
> http://adaptivepatchwork.com/2012/03/01/mind-the-end-of-your-line/
> which about the above line says... "This is certainly better than
> requiring everyone to be on the same global setting for core.autocrlf,
> but it means that you really trust Git to do binary detection properly.
> In my opinion it is better to explicitly specify your text files that
> you want normalized."
>
> and https://tinyurl.com/ya9xsprx  says "We had a repo with * text=auto,
> and Git guessed wrong for an image file that it was a text file, causing
> it to corrupt it as it replaced CR + LF bytes with LF bytes in the
> object database."
>
> I'm unsure.  Without it the system is subject to different users'
> different global settings
> and I'd guess that may be a more frequent problem than Git guessing
> wrong. The latter
> can be fixed by a user adding an extra  .gitattributes  entry explicitly
> specifying the file was binary,
> whereas the former seems to introduce a confounding factor.
> So probably a good line to have.
>
>     *.sttext merge=union eol=lf
>
>     could fix the problem?
>     can someone confirm this?
>
>
> "eol=lf"   looks appropriate...
> https://www.scivision.co/git-line-endings-windows-cygwin-wsl/
>
> Most editors on Windows transparently handle LF line endings.
> https://en.wikipedia.org/wiki/Comparison_of_text_editors#Newline_support
>
>
> " merge=union" I am not familiar with, but I read at...
> https://git-scm.com/docs/gitattributes
> "union = Run 3-way file level merge for text files, but take lines from
> both versions, instead of leaving conflict markers.
> This tends to leave the added lines in the resulting file in random**
> order and the user should verify the result.
> Do not use this if you do not understand the implications."
>
> What are the implications of lines being merged in a random order?
>
>
> btw, has doing a callback from libgitto a custom merge driver in Pharo
> been considered?
> https://libgit2.github.com/libgit2/#HEAD/group/callback/git_merge_driver_apply_fn

There is a merge driver for parts of the filetree format implemented
with Pharo, it could be done on a more general basis if the Tonel format
exhibit more conflicts than usual.

But (and this is a big "but"), mixed Pharo / other things repositories
with very large files to merge could make things very hard on a
smalltalk-implemented merge algorithm.

In most (all?) my professionnal work, this is the case. I have among my
projects a mix FPGA design (verilog + vhdl) + C (drivers, runtime) +
Smalltalk, and the smalltalk part is small.

> btw2, I found (https://githubengineering.com/move-fast/) interesting...
> saying... "Despite being a C library, libgit2 contains many powerful
> abstractions to accomplish complex tasks that Git simply cannot do. One
> of these features are indexes that exist solely in memory and allow
> work-tree related operations to be performed without an actual working
> directory. [...]  With the in-memory index, libgit2 is capable of
> merging two trees in a repository without having to check out any of
> their files on disk."

Yes, I considered that for GitFileTree. The current version uses
fast-import and archive (resp. to write and read) and in truth could
work on a bare repository, without working tree.

Oh, by the way: it also solves the #lf issue, because you do everything
the unix way, even on windows: GitFileTree now never touches a file of
the host system.

Thierry

>
>
> On 11 April 2018 at 05:55, Esteban Lorenzano<[hidden email]
> <mailto:[hidden email]>>wrote:
>
>     or a .iceberg file?
>
>     Esteban
>
>     ps: yep, we need it… we will have it, why not start now?
>
>
> Do you mean Iceberg would clone a repo, and from its included  .iceberg  
> file
> a matching  .gitattributes  file would be created?
> That seems like double handling.
> Why not have the user edit the  .gitattributes  file directly from Iceberg?
> Iceberg might provide some appropriate templates.
>
>
> cheers -ben
>


Reply | Threaded
Open this post in threaded view
|

Re: help wanted: normalising LF on tonel for Pharo project

Ben Coman


On 12 April 2018 at 01:47, Thierry Goubier <[hidden email]> wrote:
Hi Ben,

Le 11/04/2018 à 16:37, Ben Coman a écrit :


On 11 April 2018 at 05:05, Esteban Lorenzano <[hidden email] <mailto:[hidden email]>> wrote:

    Hi,

    I’ve been wondering how to better fix the problem of having windows
    and linux/macOS people contributing and the fact that files are
    written in their native system format (crlf windows, lf for the rest
    of the world).



    I digged a bit and I found a couple a link that helped me (after
    trying to understand the doc):
    https://stackoverflow.com/questions/170961/whats-the-best-crlf-carriage-return-line-feed-handling-strategy-with-git
    <https://stackoverflow.com/questions/170961/whats-the-best-crlf-carriage-return-line-feed-handling-strategy-with-git>

    and it seems adding a .gitattributes file with this content:

    # Auto detect text files and perform LF normalization
    *text=auto


I see a few posts around that recommend reading http://adaptivepatchwork.com/2012/03/01/mind-the-end-of-your-line/
which about the above line says... "This is certainly better than requiring everyone to be on the same global setting for core.autocrlf, but it means that you really trust Git to do binary detection properly. In my opinion it is better to explicitly specify your text files that you want normalized."

and https://tinyurl.com/ya9xsprx  says "We had a repo with * text=auto, and Git guessed wrong for an image file that it was a text file, causing it to corrupt it as it replaced CR + LF bytes with LF bytes in the object database."

I'm unsure.  Without it the system is subject to different users' different global settings
and I'd guess that may be a more frequent problem than Git guessing wrong. The latter
can be fixed by a user adding an extra  .gitattributes  entry explicitly specifying the file was binary,
whereas the former seems to introduce a confounding factor.
So probably a good line to have.

    *.sttext merge=union eol=lf

    could fix the problem?
    can someone confirm this?


"eol=lf"   looks appropriate...
https://www.scivision.co/git-line-endings-windows-cygwin-wsl/

Most editors on Windows transparently handle LF line endings.
https://en.wikipedia.org/wiki/Comparison_of_text_editors#Newline_support


" merge=union" I am not familiar with, but I read at... https://git-scm.com/docs/gitattributes
"union = Run 3-way file level merge for text files, but take lines from both versions, instead of leaving conflict markers.
This tends to leave the added lines in the resulting file in random** order and the user should verify the result.
Do not use this if you do not understand the implications."

What are the implications of lines being merged in a random order?


btw, has doing a callback from libgitto a custom merge driver in Pharo been considered?
https://libgit2.github.com/libgit2/#HEAD/group/callback/git_merge_driver_apply_fn

There is a merge driver for parts of the filetree format implemented with Pharo, it could be done on a more general basis if the Tonel format exhibit more conflicts than usual.

But (and this is a big "but"), mixed Pharo / other things repositories with very large files to merge could make things very hard on a smalltalk-implemented merge algorithm.

I was thinking that a smalltalk-implemented merge algorithm would only be used for the Smalltal/Tonel code,
not for any other files.  And maybe, when a merge is invoked from Iceberg, the callback to the merge-driver 
might present conflicts in a GUI to be resolved, but I guess such would require a threaded-VM.

cheers -ben



In most (all?) my professional work, this is the case. I have among my projects a mix FPGA design (verilog + vhdl) + C (drivers, runtime) + Smalltalk, and the smalltalk part is small.

btw2, I found (https://githubengineering.com/move-fast/) interesting...
saying... "Despite being a C library, libgit2 contains many powerful abstractions to accomplish complex tasks that Git simply cannot do. One of these features are indexes that exist solely in memory and allow work-tree related operations to be performed without an actual working directory. [...]  With the in-memory index, libgit2 is capable of merging two trees in a repository without having to check out any of their files on disk."

Yes, I considered that for GitFileTree. The current version uses fast-import and archive (resp. to write and read) and in truth could work on a bare repository, without working tree.

Oh, by the way: it also solves the #lf issue, because you do everything the unix way, even on windows: GitFileTree now never touches a file of the host system.


Thierry




On 11 April 2018 at 05:55, Esteban Lorenzano<[hidden email] <mailto:[hidden email]>>wrote:

    or a .iceberg file?

    Esteban

    ps: yep, we need it… we will have it, why not start now?


Do you mean Iceberg would clone a repo, and from its included  .iceberg  file
a matching  .gitattributes  file would be created?
That seems like double handling.
Why not have the user edit the  .gitattributes  file directly from Iceberg?
Iceberg might provide some appropriate templates.


cheers -ben




Reply | Threaded
Open this post in threaded view
|

Re: help wanted: normalising LF on tonel for Pharo project

Thierry Goubier
Le 12/04/2018 à 03:54, Ben Coman a écrit :

>
>
>
>
> I was thinking that a smalltalk-implemented merge algorithm would only
> be used for the Smalltal/Tonel code,
> not for any other files.  And maybe, when a merge is invoked from
> Iceberg, the callback to the merge-driver
> might present conflicts in a GUI to be resolved, but I guess such would
> require a threaded-VM.

Two things then.

- What happens if the C developper does a merge in a multi-language
project containing tonel files?

- What is the difference with setting and provides a merge driver in
Git, which has the ability to work even without libcgit?

Thierry

>
> cheers -ben
>
>
>
...

Reply | Threaded
Open this post in threaded view
|

Re: help wanted: normalising LF on tonel for Pharo project

Ben Coman


On 12 April 2018 at 12:39, Thierry Goubier <[hidden email]> wrote:
Le 12/04/2018 à 03:54, Ben Coman a écrit :

I was thinking that a smalltalk-implemented merge algorithm would only be used for the Smalltal/Tonel code,
not for any other files.  And maybe, when a merge is invoked from Iceberg, the callback to the merge-driver
might present conflicts in a GUI to be resolved, but I guess such would require a threaded-VM.

Two things then.

- What happens if the C developer does a merge in a multi-language project containing tonel files? 
- What is the difference with setting and provides a merge driver in Git, which has the ability to work even without libcgit?

I don't quite understand the question.  By "setting" do you mean in .gitattributes?

The same merge-algorithm could be invoked in two ways.

The first would be "externally" from the shell,
booting an Image to invoke the merge-algorithm
with the files-to-process as arguments. This wouldn't need libgit.
Conflicts could left marked in text files similar to exiting merge, 
or the running Image could present them in a GUI to resolve.
A tool is required either way.
I guess one of the existing options does this already?

But I had imagined a problem(?) in an already running Image, with Iceberg doing a merge through libgit
being able to invoke the merge driver in the already running Image. 
Now you've made me think it through more, I see some holes.
Perhaps its okay to have two Images running by using the "external" way anyway;
or the merge done purely "internally" before touching git, and just present the resultant "index" to libgit;
and my idea is not needed as a third way.  
Now I find I don't know the depths of the Pharo / libgit interface enough to speculate further here.

cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: help wanted: normalising LF on tonel for Pharo project

Thierry Goubier
2018-04-12 7:51 GMT+02:00 Ben Coman <[hidden email]>:

>
>
> On 12 April 2018 at 12:39, Thierry Goubier <[hidden email]>
> wrote:
>>
>> Le 12/04/2018 à 03:54, Ben Coman a écrit :
>>>
>>>
>>> I was thinking that a smalltalk-implemented merge algorithm would only be
>>> used for the Smalltal/Tonel code,
>>> not for any other files.  And maybe, when a merge is invoked from
>>> Iceberg, the callback to the merge-driver
>>> might present conflicts in a GUI to be resolved, but I guess such would
>>> require a threaded-VM.
>>
>>
>> Two things then.
>>
>> - What happens if the C developer does a merge in a multi-language project
>> containing tonel files?
>>
>> - What is the difference with setting and provides a merge driver in Git,
>> which has the ability to work even without libcgit?
>
>
> I don't quite understand the question.  By "setting" do you mean in
> .gitattributes?

Yes.

> The same merge-algorithm could be invoked in two ways.
>
> The first would be "externally" from the shell,
> booting an Image to invoke the merge-algorithm
> with the files-to-process as arguments. This wouldn't need libgit.
> Conflicts could left marked in text files similar to exiting merge,
> or the running Image could present them in a GUI to resolve.
> A tool is required either way.
> I guess one of the existing options does this already?

Yes. The GitFileTree-MergeDriver works like that. It is a headless
image, now buildt out of the Pharo6 minimal image, and it is called by
git for specific file types.

What is missing is the GUI part, like a git merge-tool like meld can do.

> But I had imagined a problem(?) in an already running Image, with Iceberg
> doing a merge through libgit
> being able to invoke the merge driver in the already running Image.
> Now you've made me think it through more, I see some holes.
> Perhaps its okay to have two Images running by using the "external" way
> anyway;

Yes: in that case the merge driver is called in another process
anyway. Also, you want to reduce the overhead (make the pharo-based
merge driver as  fast as possible to start, because git will start it
for each file to merge). The nice thing with that setup is that the
common ancestor search is done by git.

> or the merge done purely "internally" before touching git, and just present
> the resultant "index" to libgit;

That one has the issue of having to deal with non-smalltalk files
(i.e. write a generic merge tool). If you restrict yourself to
smalltalk merges, then that can be fine (i.e. you merge with
Monticello, and the resulting package is stored as a merge in the git
store). But you reverse who is the master... in the later, smalltalk
is the master, and git store smalltalk results; in the merge driver,
git is the master, and smalltalk does what git requires.

But someone can still come from the outside of the repo and do a merge
on the command line; which is also fine.

> and my idea is not needed as a third way.
> Now I find I don't know the depths of the Pharo / libgit interface enough to
> speculate further here.

Yes, it seems that there are multiple ways to interface with it.

Thierry

> cheers -ben

12