FileReference>>/ and path canonicalisation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

FileReference>>/ and path canonicalisation

alistairgrant
Pharo's FileSystem behaviour is currently somewhat inconsistent in its
treatment of path strings.

To be able to demonstrate the limitations and fixes, I'm assuming that
the following files and directories have been created (/dev/shm is a
memory based file system in Ubuntu):

cd /dev/shm
mkdir -p d1/d2/d3/d4
touch d1/t1.txt d1/d2/t2.txt d1/d2/d3/t3.txt d1/d2/d3/d4/t4.txt
pushd d1 && ln -s d2/d3 ./s3 && popd



1. Path's are canonicalised during initial creation, but are not when
extending the path, i.e. sending #/

E.g.

'/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/t1.txt"

'/dev/shm/d1/' asFileReference / 'd2/../t1.txt'  " File @ /dev/shm/d1/d2/../t1.txt"

Automatic canonicalisation is problematic as the code doesn't handle
symbolic links in the path name:

('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists  " true (correct answer)"

'/dev/shm/d1/s3/../t2.txt' asFileReference exists  " false (wrong answer)"


2. In an attempt to be completely general, the argument to #/ is assumed
to be a single directory / file.  This is incorrect as the path
delimiter is not allowed to be part of the file name.  The result is
that path comparisons and operations such as #parent give unexpected
results.

E.g.

('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') exists  " true"

('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1"


My PR modified FileSystem so that:

1. Canonicalisation is separated out as an operation that has to be
manually requested:

'/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/d2/../t1.txt"

'/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize  " File @ /dev/shm/d1/t1.txt"


2. The argument to #/ can be either a single file / directory name or a
path.

('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1/d2/d3"



3. Adds unit tests to cover the changed behaviour.


While I believe that these changes improve Pharo overall, making it more
intuitive and consistent, modifying the path creation to remove
canonicalisation is not backward compatible and requires a change to
PathTest>>testRelativeFromStringNormalization and
PathTest>>testRelativeFromStringNormalizationParent to manually
canonicalise the paths.

Before I submit the patch, does anyone have any strong objections to the
changes as described above?

Thanks,
Alistair

Reply | Threaded
Open this post in threaded view
|

Re: FileReference>>/ and path canonicalisation

Stephane Ducasse-3
Thanks Alistair
I'm buzzy and I was trying to learn how to integrate issues in Pharo
70 but so far I was too tired when pavel explained that to me.

On Mon, Jun 26, 2017 at 10:12 AM, Alistair Grant <[hidden email]> wrote:

> Pharo's FileSystem behaviour is currently somewhat inconsistent in its
> treatment of path strings.
>
> To be able to demonstrate the limitations and fixes, I'm assuming that
> the following files and directories have been created (/dev/shm is a
> memory based file system in Ubuntu):
>
> cd /dev/shm
> mkdir -p d1/d2/d3/d4
> touch d1/t1.txt d1/d2/t2.txt d1/d2/d3/t3.txt d1/d2/d3/d4/t4.txt
> pushd d1 && ln -s d2/d3 ./s3 && popd
>
>
>
> 1. Path's are canonicalised during initial creation, but are not when
> extending the path, i.e. sending #/
>
> E.g.
>
> '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/t1.txt"
>
> '/dev/shm/d1/' asFileReference / 'd2/../t1.txt'  " File @ /dev/shm/d1/d2/../t1.txt"
>
> Automatic canonicalisation is problematic as the code doesn't handle
> symbolic links in the path name:
>
> ('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists  " true (correct answer)"
>
> '/dev/shm/d1/s3/../t2.txt' asFileReference exists  " false (wrong answer)"
>
>
> 2. In an attempt to be completely general, the argument to #/ is assumed
> to be a single directory / file.  This is incorrect as the path
> delimiter is not allowed to be part of the file name.  The result is
> that path comparisons and operations such as #parent give unexpected
> results.
>
> E.g.
>
> ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') exists  " true"
>
> ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1"
>
>
> My PR modified FileSystem so that:
>
> 1. Canonicalisation is separated out as an operation that has to be
> manually requested:
>
> '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/d2/../t1.txt"
>
> '/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize  " File @ /dev/shm/d1/t1.txt"
>
>
> 2. The argument to #/ can be either a single file / directory name or a
> path.
>
> ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1/d2/d3"
>
>
>
> 3. Adds unit tests to cover the changed behaviour.
>
>
> While I believe that these changes improve Pharo overall, making it more
> intuitive and consistent, modifying the path creation to remove
> canonicalisation is not backward compatible and requires a change to
> PathTest>>testRelativeFromStringNormalization and
> PathTest>>testRelativeFromStringNormalizationParent to manually
> canonicalise the paths.
>
> Before I submit the patch, does anyone have any strong objections to the
> changes as described above?
>
> Thanks,
> Alistair
>

Reply | Threaded
Open this post in threaded view
|

Re: FileReference>>/ and path canonicalisation

Sven Van Caekenberghe-2
In reply to this post by alistairgrant
Hi Alistair,

I think it is great that you are working on FileSystem and are trying to contribute. I appreciate your effort to find consensus.

But, and please take this as positive criticism, I feel a bit uneasy when I read your reasoning, but maybe I am wrong.

You see, the way I understand FileSystem (what I think was the original design idea), it is a cross platform abstraction over concrete file systems and paths.

A FileReference holds a FileSystem and a Path. A Path is just a collection of elements that leads to a location, the leaf possibly being interpreted as a file with an extension (although that last point is just a convention). It seems great effort was put into avoiding the use of separators.

Pharo is an object oriented system with appropriate abstractions, FileSystem gives us one approach to this difficult problem.

In your reasoning, your truth, your reference is always the Unix file system and the way paths are handled there. You expect a number of things based on that, but maybe that was/is not the design goal.


I said this before, but I am not sure we should interpret the argument of #/.

I think that

  FileLocator root / 'foo' / 'bar' / 'readme.txt'.

is more abstract (fitting to the original goal) then

  '/foo' asFileReference / 'bar/readme.txt'.

because it totally avoids a reference to the platform dependent path separator.

The same goes for '..' as

  (FileLocator root / 'foo' / 'bar' / 'down') parent / 'readme.txt'.

is a more object oriented way to write '/foo/bar/down/../readme.txt', IMO.

Windows is an important target for Pharo, they use $\ not $/.

Do you see my point ?

BTW, I consider the fact that current,

 '/foo' asFileReference / 'bar/readme.txt'

works with #exists even though 'bar/readme.txt' was not fully parsed/resolved a bug, or a happy coincidence at best.

Another point is the distinction between internal/external and/or concrete/abstract paths. Should something like '..' remain part of a path. Never, always, only when it can be resolved ? What about special characters ?

Should we support ~ ? Sometimes I would like that too, but maybe

  FileLocator home

is enough.

All this being said, I think FileSystem can and should be improved, but carefully.

It would be good if more people joined this discussion.

Sven

> On 26 Jun 2017, at 10:12, Alistair Grant <[hidden email]> wrote:
>
> Pharo's FileSystem behaviour is currently somewhat inconsistent in its
> treatment of path strings.
>
> To be able to demonstrate the limitations and fixes, I'm assuming that
> the following files and directories have been created (/dev/shm is a
> memory based file system in Ubuntu):
>
> cd /dev/shm
> mkdir -p d1/d2/d3/d4
> touch d1/t1.txt d1/d2/t2.txt d1/d2/d3/t3.txt d1/d2/d3/d4/t4.txt
> pushd d1 && ln -s d2/d3 ./s3 && popd
>
>
>
> 1. Path's are canonicalised during initial creation, but are not when
> extending the path, i.e. sending #/
>
> E.g.
>
> '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/t1.txt"
>
> '/dev/shm/d1/' asFileReference / 'd2/../t1.txt'  " File @ /dev/shm/d1/d2/../t1.txt"
>
> Automatic canonicalisation is problematic as the code doesn't handle
> symbolic links in the path name:
>
> ('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists  " true (correct answer)"
>
> '/dev/shm/d1/s3/../t2.txt' asFileReference exists  " false (wrong answer)"
>
>
> 2. In an attempt to be completely general, the argument to #/ is assumed
> to be a single directory / file.  This is incorrect as the path
> delimiter is not allowed to be part of the file name.  The result is
> that path comparisons and operations such as #parent give unexpected
> results.
>
> E.g.
>
> ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') exists  " true"
>
> ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1"
>
>
> My PR modified FileSystem so that:
>
> 1. Canonicalisation is separated out as an operation that has to be
> manually requested:
>
> '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/d2/../t1.txt"
>
> '/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize  " File @ /dev/shm/d1/t1.txt"
>
>
> 2. The argument to #/ can be either a single file / directory name or a
> path.
>
> ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1/d2/d3"
>
>
>
> 3. Adds unit tests to cover the changed behaviour.
>
>
> While I believe that these changes improve Pharo overall, making it more
> intuitive and consistent, modifying the path creation to remove
> canonicalisation is not backward compatible and requires a change to
> PathTest>>testRelativeFromStringNormalization and
> PathTest>>testRelativeFromStringNormalizationParent to manually
> canonicalise the paths.
>
> Before I submit the patch, does anyone have any strong objections to the
> changes as described above?
>
> Thanks,
> Alistair
>


Reply | Threaded
Open this post in threaded view
|

Re: FileReference>>/ and path canonicalisation

K K Subbu
In reply to this post by alistairgrant
On Monday 26 June 2017 01:42 PM, Alistair Grant wrote:

> 1. Path's are canonicalised during initial creation, but are not when
> extending the path, i.e. sending #/
>
> E.g.
>
> '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/t1.txt"
>
> '/dev/shm/d1/' asFileReference / 'd2/../t1.txt'  " File @ /dev/shm/d1/d2/../t1.txt"
>
> Automatic canonicalisation is problematic as the code doesn't handle
> symbolic links in the path name:
>
> ('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists  " true (correct answer)"
>
> '/dev/shm/d1/s3/../t2.txt' asFileReference exists  " false (wrong answer)"

The meaning of 's3/..' is ambiguous when s3 is a symlink. Should 's3' be
resolved or not before #parent op is applied? Disambiguation depends on
context, so early canonicalization is a bug.

My own preference is to not to canonicalize at all and 's3/..' should be
left as {s3, ..}. See

   http://man7.org/linux/man-pages/man7/path_resolution.7.html

> 2. In an attempt to be completely general, the argument to #/ is assumed
> to be a single directory / file.  This is incorrect as the path
> delimiter is not allowed to be part of the file name.  The result is
> that path comparisons and operations such as #parent give unexpected
> results.

Path class>>from:delimiter has a bug. It attempts to canonicalize
elements parsed from the String regardless of the context. If the code
is changed to:

  pathClass withAll: (aDelimiterCharacter split: aString)

your test will pass.

I wonder why Path was subclassed from Object and not from
SequenceableCollection?

> My PR modified FileSystem so that:
>
> 1. Canonicalisation is separated out as an operation that has to be
> manually requested:
>
> '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/d2/../t1.txt"
>
> '/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize  " File @ /dev/shm/d1/t1.txt"

I am ok with this.

> 2. The argument to #/ can be either a single file / directory name or a
> path.
>
> ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1/d2/d3"

ok with this too.

> 3. Adds unit tests to cover the changed behaviour.
>...
> Before I submit the patch, does anyone have any strong objections to the
> changes as described above?

None from my side.

Regards .. Subbu

Reply | Threaded
Open this post in threaded view
|

Re: FileReference>>/ and path canonicalisation

K K Subbu
In reply to this post by Sven Van Caekenberghe-2
On Tuesday 27 June 2017 05:01 PM, Sven Van Caekenberghe wrote:
> A FileReference holds a FileSystem and a Path. A Path is just a
> collection of elements that leads to a location, the leaf possibly
> being interpreted as a file with an extension (although that last
> point is just a convention). It seems great effort was put into
> avoiding the use of separators.
>
> Pharo is an object oriented system with appropriate abstractions,
> FileSystem gives us one approach to this difficult problem.

+1. Pharo's elegant design captures the essentials of naming file objects.

> In your reasoning, your truth, your reference is always the Unix file
> system and the way paths are handled >    FileLocator root / 'foo' /
> 'bar' / 'readme.txt'.
>
> is more abstract (fitting to the original goal) then
>
> '/foo' asFileReference / 'bar/readme.txt'.
>
> because it totally avoids a reference to the platform dependent path
> separator.

Agreed that the first form is quite elegant. But I see no harm in
graceful acceptance of the second. Such strings could be encountered in
scripts and variables. It is also a convenient shorthand.

> Windows is an important target for Pharo, they use $\ not $/.

Windows cmd shell uses only $\ while its API accepts both. This is a
eternal meme thread in Win32 mailing lists ;-).

> Another point is the distinction between internal/external and/or
> concrete/abstract paths. Should something like '..' remain part of a
> path. Never, always, only when it can be resolved ? What about
> special characters ?

Path elements are platform-specific and their resolution to specific
inode is context-specific. Canonicalization is only needed in low level
code.

> Should we support ~ ? Sometimes I would like that too, but maybe
>
> FileLocator home

Elements like ~ or $HOME are expanded by shell before calling host API.
GUI shells based on XDG also expand other variables (see
~/.config/user-dirs.dirs) It is upto Pharo to define its own variables
and expand them just before calling APIs (say from {$PHARO_HOME,
"pharo.conf"}

Regards .. Subbu

Reply | Threaded
Open this post in threaded view
|

Re: FileReference>>/ and path canonicalisation

alistairgrant
In reply to this post by Sven Van Caekenberghe-2
Hi Sven,

On Tue, Jun 27, 2017 at 01:31:42PM +0200, Sven Van Caekenberghe wrote:
> Hi Alistair,
>
> I think it is great that you are working on FileSystem and are trying
> to contribute. I appreciate your effort to find consensus.

:-)


> But, and please take this as positive criticism, I feel a bit uneasy
> when I read your reasoning, but maybe I am wrong.

Thanks for taking the trouble to reply.  Hopefully I can address some of
your unease below.


> You see, the way I understand FileSystem (what I think was the
> original design idea), it is a cross platform abstraction over
> concrete file systems and paths.

It does appear to be the case that effort was put in to avoid specifying
a separator, but it makes an assumption that is incorrect. The class
comments of path state that the seperator can be part of a file name,
and this is incorrect for every disk file system Pharo supports (as far
as I know).



> A FileReference holds a FileSystem and a Path. A Path is just a
> collection of elements that leads to a location, the leaf possibly
> being interpreted as a file with an extension (although that last
> point is just a convention). It seems great effort was put into
> avoiding the use of separators.

> Pharo is an object oriented system with appropriate abstractions,
> FileSystem gives us one approach to this difficult problem.
>
> In your reasoning, your truth, your reference is always the Unix file
> system and the way paths are handled there. You expect a number of
> things based on that, but maybe that was/is not the design goal.

I do have a bias towards the Unix filesystem, and you're correct that
all the examples I provided use the Unix file system, but I've tried to
make sure that the changes I've made apply equally to Windows file
systems (more below).


> I said this before, but I am not sure we should interpret the argument of #/.

I'll come back to this later :-)


> I think that
>
>   FileLocator root / 'foo' / 'bar' / 'readme.txt'.
>
> is more abstract (fitting to the original goal) then
>
>   '/foo' asFileReference / 'bar/readme.txt'.
>
> because it totally avoids a reference to the platform dependent path separator.

I wasn't trying to suggest that code would normally be written this way.  
It is more likely that '/foo' asFileReference is created early in the
code, and much later an input is supplied which is 'bar/readme.txt'.  At
that point, the natural thing to do is send #/ as above.

The problem with only supporting the first case above is that it puts
the onus of parsing the string on the user.  We could add a another
method that parses the supplied string, but that will just make the
interface more confusing, and we already have a steady stream of
messages to the list from people getting confused by this.


> The same goes for '..' as
>
>   (FileLocator root / 'foo' / 'bar' / 'down') parent / 'readme.txt'.
>
> is a more object oriented way to write '/foo/bar/down/../readme.txt', IMO.

Agreed, but as mentioned above, we also have to deal with input strings
from external sources.


> Windows is an important target for Pharo, they use $\ not $/.

The patch handles that, so on windows I can do:

('C:\cygwin' asFileReference / 'usr\local\bin') parent  " File @ C:\cygwin\usr\local"

(just to be clear, this is an example of how the strings are handled,
I'm not expecting anyone to write code like the above).


> Do you see my point ?

I think so, but... :-)


> BTW, I consider the fact that current,
>
>  '/foo' asFileReference / 'bar/readme.txt'
>
> works with #exists even though 'bar/readme.txt' was not fully
> parsed/resolved a bug, or a happy coincidence at best.

I agree that it is a happy coincedence, but it is also intuitive.


> Another point is the distinction between internal/external and/or
> concrete/abstract paths. Should something like '..' remain part of a
> path. Never, always, only when it can be resolved ? What about special
> characters ?

I'm arguing that '..' should be left in the path unless explicitly told
to be removed, so that things like symbolic links are handled properly
(by the file system itself).  (Thanks to Denis for reminding me about
this in an earlier email thread)

While it's nice to be completely general, is there a supported file
system that doesn't interpret ".." as the parent directory?

As an aside, another patch I'll be submitting fixes symbolic link
handling so we could conveivably write a #canonicalizeOnFileSystem that
would properly handle 'symboliclink/..', but it would be slow.


> Should we support ~ ? Sometimes I would like that too, but maybe
>
>   FileLocator home
>
> is enough.

As Subbu points out in a later message, things like '~' are handled by
the shell.  We could consider adding something like #expand to handle
'~' and shell variables, e.g. $HOME, but I'd make that a separate patch.

It would also need to be done carefully as $ and ~ are valid characters
within file names (only / and the null character are not allowed in
Posix file systems, Windows has more characters that are not allowed).



> All this being said, I think FileSystem can and should be improved,
> but carefully.
>
> It would be good if more people joined this discussion.
>
> Sven

I think I understand the original goals, and I don't disagree with them,
however:

1. they are based on an arguably incorrect assumption (the
separator character can be part of a file name),

2. they've caused quite a bit of confusion due to unexpected behaviour,

3. if support for a file system that allows the separator in file names
is ever added, we will still have to be able to parse a full path
correctly (which the current code wouldn't do)

4. I don't think these changes conflict with the original goals, all the
examples you provided above still function exactly the same.

Does that help reduce your uneasiness?

Thanks again,
Alistair




> > On 26 Jun 2017, at 10:12, Alistair Grant <[hidden email]> wrote:
> >
> > Pharo's FileSystem behaviour is currently somewhat inconsistent in its
> > treatment of path strings.
> >
> > To be able to demonstrate the limitations and fixes, I'm assuming that
> > the following files and directories have been created (/dev/shm is a
> > memory based file system in Ubuntu):
> >
> > cd /dev/shm
> > mkdir -p d1/d2/d3/d4
> > touch d1/t1.txt d1/d2/t2.txt d1/d2/d3/t3.txt d1/d2/d3/d4/t4.txt
> > pushd d1 && ln -s d2/d3 ./s3 && popd
> >
> >
> >
> > 1. Path's are canonicalised during initial creation, but are not when
> > extending the path, i.e. sending #/
> >
> > E.g.
> >
> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/t1.txt"
> >
> > '/dev/shm/d1/' asFileReference / 'd2/../t1.txt'  " File @ /dev/shm/d1/d2/../t1.txt"
> >
> > Automatic canonicalisation is problematic as the code doesn't handle
> > symbolic links in the path name:
> >
> > ('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists  " true (correct answer)"
> >
> > '/dev/shm/d1/s3/../t2.txt' asFileReference exists  " false (wrong answer)"
> >
> >
> > 2. In an attempt to be completely general, the argument to #/ is assumed
> > to be a single directory / file.  This is incorrect as the path
> > delimiter is not allowed to be part of the file name.  The result is
> > that path comparisons and operations such as #parent give unexpected
> > results.
> >
> > E.g.
> >
> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') exists  " true"
> >
> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1"
> >
> >
> > My PR modified FileSystem so that:
> >
> > 1. Canonicalisation is separated out as an operation that has to be
> > manually requested:
> >
> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/d2/../t1.txt"
> >
> > '/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize  " File @ /dev/shm/d1/t1.txt"
> >
> >
> > 2. The argument to #/ can be either a single file / directory name or a
> > path.
> >
> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1/d2/d3"
> >
> >
> >
> > 3. Adds unit tests to cover the changed behaviour.
> >
> >
> > While I believe that these changes improve Pharo overall, making it more
> > intuitive and consistent, modifying the path creation to remove
> > canonicalisation is not backward compatible and requires a change to
> > PathTest>>testRelativeFromStringNormalization and
> > PathTest>>testRelativeFromStringNormalizationParent to manually
> > canonicalise the paths.
> >
> > Before I submit the patch, does anyone have any strong objections to the
> > changes as described above?
> >
> > Thanks,
> > Alistair

Reply | Threaded
Open this post in threaded view
|

Re: FileReference>>/ and path canonicalisation

alistairgrant
In reply to this post by K K Subbu
Hi Subbu,

On Tue, Jun 27, 2017 at 09:19:20PM +0530, K K Subbu wrote:

> On Monday 26 June 2017 01:42 PM, Alistair Grant wrote:
> >1. Path's are canonicalised during initial creation, but are not when
> >extending the path, i.e. sending #/
> >
> >E.g.
> >
> >'/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/t1.txt"
> >
> >'/dev/shm/d1/' asFileReference / 'd2/../t1.txt'  " File @ /dev/shm/d1/d2/../t1.txt"
> >
> >Automatic canonicalisation is problematic as the code doesn't handle
> >symbolic links in the path name:
> >
> >('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists  " true (correct answer)"
> >
> >'/dev/shm/d1/s3/../t2.txt' asFileReference exists  " false (wrong answer)"
>
> The meaning of 's3/..' is ambiguous when s3 is a symlink. Should 's3' be
> resolved or not before #parent op is applied? Disambiguation depends on
> context, so early canonicalization is a bug.
>
> My own preference is to not to canonicalize at all and 's3/..' should be
> left as {s3, ..}. See
>
>   http://man7.org/linux/man-pages/man7/path_resolution.7.html
>
> >2. In an attempt to be completely general, the argument to #/ is assumed
> >to be a single directory / file.  This is incorrect as the path
> >delimiter is not allowed to be part of the file name.  The result is
> >that path comparisons and operations such as #parent give unexpected
> >results.
>
> Path class>>from:delimiter has a bug. It attempts to canonicalize elements
> parsed from the String regardless of the context. If the code is changed to:
>
>  pathClass withAll: (aDelimiterCharacter split: aString)
>
> your test will pass.

This is basically what I do (it also removes empty segments).


> I wonder why Path was subclassed from Object and not from
> SequenceableCollection?
>
> >My PR modified FileSystem so that:
> >
> >1. Canonicalisation is separated out as an operation that has to be
> >manually requested:
> >
> >'/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/d2/../t1.txt"
> >
> >'/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize  " File @ /dev/shm/d1/t1.txt"
>
> I am ok with this.
>
> >2. The argument to #/ can be either a single file / directory name or a
> >path.
> >
> >('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1/d2/d3"
>
> ok with this too.
>
> >3. Adds unit tests to cover the changed behaviour.
> >...
> >Before I submit the patch, does anyone have any strong objections to the
> >changes as described above?
>
> None from my side.
>
> Regards .. Subbu

Thanks!
Alistair


Reply | Threaded
Open this post in threaded view
|

Re: FileReference>>/ and path canonicalisation

Stephane Ducasse-3
In reply to this post by alistairgrant
Hi alistair

I do not know if this is me that wrote a wrong path class comment.
You should consider that theere were nearly no comment at all and I
started to try to give more love to this great library.

Stef

On Tue, Jun 27, 2017 at 8:56 PM, Alistair Grant <[hidden email]> wrote:

> Hi Sven,
>
> On Tue, Jun 27, 2017 at 01:31:42PM +0200, Sven Van Caekenberghe wrote:
>> Hi Alistair,
>>
>> I think it is great that you are working on FileSystem and are trying
>> to contribute. I appreciate your effort to find consensus.
>
> :-)
>
>
>> But, and please take this as positive criticism, I feel a bit uneasy
>> when I read your reasoning, but maybe I am wrong.
>
> Thanks for taking the trouble to reply.  Hopefully I can address some of
> your unease below.
>
>
>> You see, the way I understand FileSystem (what I think was the
>> original design idea), it is a cross platform abstraction over
>> concrete file systems and paths.
>
> It does appear to be the case that effort was put in to avoid specifying
> a separator, but it makes an assumption that is incorrect. The class
> comments of path state that the seperator can be part of a file name,
> and this is incorrect for every disk file system Pharo supports (as far
> as I know).
>
>
>
>> A FileReference holds a FileSystem and a Path. A Path is just a
>> collection of elements that leads to a location, the leaf possibly
>> being interpreted as a file with an extension (although that last
>> point is just a convention). It seems great effort was put into
>> avoiding the use of separators.
>
>> Pharo is an object oriented system with appropriate abstractions,
>> FileSystem gives us one approach to this difficult problem.
>>
>> In your reasoning, your truth, your reference is always the Unix file
>> system and the way paths are handled there. You expect a number of
>> things based on that, but maybe that was/is not the design goal.
>
> I do have a bias towards the Unix filesystem, and you're correct that
> all the examples I provided use the Unix file system, but I've tried to
> make sure that the changes I've made apply equally to Windows file
> systems (more below).
>
>
>> I said this before, but I am not sure we should interpret the argument of #/.
>
> I'll come back to this later :-)
>
>
>> I think that
>>
>>   FileLocator root / 'foo' / 'bar' / 'readme.txt'.
>>
>> is more abstract (fitting to the original goal) then
>>
>>   '/foo' asFileReference / 'bar/readme.txt'.
>>
>> because it totally avoids a reference to the platform dependent path separator.
>
> I wasn't trying to suggest that code would normally be written this way.
> It is more likely that '/foo' asFileReference is created early in the
> code, and much later an input is supplied which is 'bar/readme.txt'.  At
> that point, the natural thing to do is send #/ as above.
>
> The problem with only supporting the first case above is that it puts
> the onus of parsing the string on the user.  We could add a another
> method that parses the supplied string, but that will just make the
> interface more confusing, and we already have a steady stream of
> messages to the list from people getting confused by this.
>
>
>> The same goes for '..' as
>>
>>   (FileLocator root / 'foo' / 'bar' / 'down') parent / 'readme.txt'.
>>
>> is a more object oriented way to write '/foo/bar/down/../readme.txt', IMO.
>
> Agreed, but as mentioned above, we also have to deal with input strings
> from external sources.
>
>
>> Windows is an important target for Pharo, they use $\ not $/.
>
> The patch handles that, so on windows I can do:
>
> ('C:\cygwin' asFileReference / 'usr\local\bin') parent  " File @ C:\cygwin\usr\local"
>
> (just to be clear, this is an example of how the strings are handled,
> I'm not expecting anyone to write code like the above).
>
>
>> Do you see my point ?
>
> I think so, but... :-)
>
>
>> BTW, I consider the fact that current,
>>
>>  '/foo' asFileReference / 'bar/readme.txt'
>>
>> works with #exists even though 'bar/readme.txt' was not fully
>> parsed/resolved a bug, or a happy coincidence at best.
>
> I agree that it is a happy coincedence, but it is also intuitive.
>
>
>> Another point is the distinction between internal/external and/or
>> concrete/abstract paths. Should something like '..' remain part of a
>> path. Never, always, only when it can be resolved ? What about special
>> characters ?
>
> I'm arguing that '..' should be left in the path unless explicitly told
> to be removed, so that things like symbolic links are handled properly
> (by the file system itself).  (Thanks to Denis for reminding me about
> this in an earlier email thread)
>
> While it's nice to be completely general, is there a supported file
> system that doesn't interpret ".." as the parent directory?
>
> As an aside, another patch I'll be submitting fixes symbolic link
> handling so we could conveivably write a #canonicalizeOnFileSystem that
> would properly handle 'symboliclink/..', but it would be slow.
>
>
>> Should we support ~ ? Sometimes I would like that too, but maybe
>>
>>   FileLocator home
>>
>> is enough.
>
> As Subbu points out in a later message, things like '~' are handled by
> the shell.  We could consider adding something like #expand to handle
> '~' and shell variables, e.g. $HOME, but I'd make that a separate patch.
>
> It would also need to be done carefully as $ and ~ are valid characters
> within file names (only / and the null character are not allowed in
> Posix file systems, Windows has more characters that are not allowed).
>
>
>
>> All this being said, I think FileSystem can and should be improved,
>> but carefully.
>>
>> It would be good if more people joined this discussion.
>>
>> Sven
>
> I think I understand the original goals, and I don't disagree with them,
> however:
>
> 1. they are based on an arguably incorrect assumption (the
> separator character can be part of a file name),
>
> 2. they've caused quite a bit of confusion due to unexpected behaviour,
>
> 3. if support for a file system that allows the separator in file names
> is ever added, we will still have to be able to parse a full path
> correctly (which the current code wouldn't do)
>
> 4. I don't think these changes conflict with the original goals, all the
> examples you provided above still function exactly the same.
>
> Does that help reduce your uneasiness?
>
> Thanks again,
> Alistair
>
>
>
>
>> > On 26 Jun 2017, at 10:12, Alistair Grant <[hidden email]> wrote:
>> >
>> > Pharo's FileSystem behaviour is currently somewhat inconsistent in its
>> > treatment of path strings.
>> >
>> > To be able to demonstrate the limitations and fixes, I'm assuming that
>> > the following files and directories have been created (/dev/shm is a
>> > memory based file system in Ubuntu):
>> >
>> > cd /dev/shm
>> > mkdir -p d1/d2/d3/d4
>> > touch d1/t1.txt d1/d2/t2.txt d1/d2/d3/t3.txt d1/d2/d3/d4/t4.txt
>> > pushd d1 && ln -s d2/d3 ./s3 && popd
>> >
>> >
>> >
>> > 1. Path's are canonicalised during initial creation, but are not when
>> > extending the path, i.e. sending #/
>> >
>> > E.g.
>> >
>> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/t1.txt"
>> >
>> > '/dev/shm/d1/' asFileReference / 'd2/../t1.txt'  " File @ /dev/shm/d1/d2/../t1.txt"
>> >
>> > Automatic canonicalisation is problematic as the code doesn't handle
>> > symbolic links in the path name:
>> >
>> > ('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists  " true (correct answer)"
>> >
>> > '/dev/shm/d1/s3/../t2.txt' asFileReference exists  " false (wrong answer)"
>> >
>> >
>> > 2. In an attempt to be completely general, the argument to #/ is assumed
>> > to be a single directory / file.  This is incorrect as the path
>> > delimiter is not allowed to be part of the file name.  The result is
>> > that path comparisons and operations such as #parent give unexpected
>> > results.
>> >
>> > E.g.
>> >
>> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') exists  " true"
>> >
>> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1"
>> >
>> >
>> > My PR modified FileSystem so that:
>> >
>> > 1. Canonicalisation is separated out as an operation that has to be
>> > manually requested:
>> >
>> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/d2/../t1.txt"
>> >
>> > '/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize  " File @ /dev/shm/d1/t1.txt"
>> >
>> >
>> > 2. The argument to #/ can be either a single file / directory name or a
>> > path.
>> >
>> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1/d2/d3"
>> >
>> >
>> >
>> > 3. Adds unit tests to cover the changed behaviour.
>> >
>> >
>> > While I believe that these changes improve Pharo overall, making it more
>> > intuitive and consistent, modifying the path creation to remove
>> > canonicalisation is not backward compatible and requires a change to
>> > PathTest>>testRelativeFromStringNormalization and
>> > PathTest>>testRelativeFromStringNormalizationParent to manually
>> > canonicalise the paths.
>> >
>> > Before I submit the patch, does anyone have any strong objections to the
>> > changes as described above?
>> >
>> > Thanks,
>> > Alistair
>

Reply | Threaded
Open this post in threaded view
|

Re: FileReference>>/ and path canonicalisation

Stephane Ducasse-3
In reply to this post by alistairgrant
Hi alistair

I'm not expert of file system. Now I like your idea to have explicit
messages such as canonalize or expand.

Stef

On Tue, Jun 27, 2017 at 8:56 PM, Alistair Grant <[hidden email]> wrote:

> Hi Sven,
>
> On Tue, Jun 27, 2017 at 01:31:42PM +0200, Sven Van Caekenberghe wrote:
>> Hi Alistair,
>>
>> I think it is great that you are working on FileSystem and are trying
>> to contribute. I appreciate your effort to find consensus.
>
> :-)
>
>
>> But, and please take this as positive criticism, I feel a bit uneasy
>> when I read your reasoning, but maybe I am wrong.
>
> Thanks for taking the trouble to reply.  Hopefully I can address some of
> your unease below.
>
>
>> You see, the way I understand FileSystem (what I think was the
>> original design idea), it is a cross platform abstraction over
>> concrete file systems and paths.
>
> It does appear to be the case that effort was put in to avoid specifying
> a separator, but it makes an assumption that is incorrect. The class
> comments of path state that the seperator can be part of a file name,
> and this is incorrect for every disk file system Pharo supports (as far
> as I know).
>
>
>
>> A FileReference holds a FileSystem and a Path. A Path is just a
>> collection of elements that leads to a location, the leaf possibly
>> being interpreted as a file with an extension (although that last
>> point is just a convention). It seems great effort was put into
>> avoiding the use of separators.
>
>> Pharo is an object oriented system with appropriate abstractions,
>> FileSystem gives us one approach to this difficult problem.
>>
>> In your reasoning, your truth, your reference is always the Unix file
>> system and the way paths are handled there. You expect a number of
>> things based on that, but maybe that was/is not the design goal.
>
> I do have a bias towards the Unix filesystem, and you're correct that
> all the examples I provided use the Unix file system, but I've tried to
> make sure that the changes I've made apply equally to Windows file
> systems (more below).
>
>
>> I said this before, but I am not sure we should interpret the argument of #/.
>
> I'll come back to this later :-)
>
>
>> I think that
>>
>>   FileLocator root / 'foo' / 'bar' / 'readme.txt'.
>>
>> is more abstract (fitting to the original goal) then
>>
>>   '/foo' asFileReference / 'bar/readme.txt'.
>>
>> because it totally avoids a reference to the platform dependent path separator.
>
> I wasn't trying to suggest that code would normally be written this way.
> It is more likely that '/foo' asFileReference is created early in the
> code, and much later an input is supplied which is 'bar/readme.txt'.  At
> that point, the natural thing to do is send #/ as above.
>
> The problem with only supporting the first case above is that it puts
> the onus of parsing the string on the user.  We could add a another
> method that parses the supplied string, but that will just make the
> interface more confusing, and we already have a steady stream of
> messages to the list from people getting confused by this.
>
>
>> The same goes for '..' as
>>
>>   (FileLocator root / 'foo' / 'bar' / 'down') parent / 'readme.txt'.
>>
>> is a more object oriented way to write '/foo/bar/down/../readme.txt', IMO.
>
> Agreed, but as mentioned above, we also have to deal with input strings
> from external sources.
>
>
>> Windows is an important target for Pharo, they use $\ not $/.
>
> The patch handles that, so on windows I can do:
>
> ('C:\cygwin' asFileReference / 'usr\local\bin') parent  " File @ C:\cygwin\usr\local"
>
> (just to be clear, this is an example of how the strings are handled,
> I'm not expecting anyone to write code like the above).
>
>
>> Do you see my point ?
>
> I think so, but... :-)
>
>
>> BTW, I consider the fact that current,
>>
>>  '/foo' asFileReference / 'bar/readme.txt'
>>
>> works with #exists even though 'bar/readme.txt' was not fully
>> parsed/resolved a bug, or a happy coincidence at best.
>
> I agree that it is a happy coincedence, but it is also intuitive.
>
>
>> Another point is the distinction between internal/external and/or
>> concrete/abstract paths. Should something like '..' remain part of a
>> path. Never, always, only when it can be resolved ? What about special
>> characters ?
>
> I'm arguing that '..' should be left in the path unless explicitly told
> to be removed, so that things like symbolic links are handled properly
> (by the file system itself).  (Thanks to Denis for reminding me about
> this in an earlier email thread)
>
> While it's nice to be completely general, is there a supported file
> system that doesn't interpret ".." as the parent directory?
>
> As an aside, another patch I'll be submitting fixes symbolic link
> handling so we could conveivably write a #canonicalizeOnFileSystem that
> would properly handle 'symboliclink/..', but it would be slow.
>
>
>> Should we support ~ ? Sometimes I would like that too, but maybe
>>
>>   FileLocator home
>>
>> is enough.
>
> As Subbu points out in a later message, things like '~' are handled by
> the shell.  We could consider adding something like #expand to handle
> '~' and shell variables, e.g. $HOME, but I'd make that a separate patch.
>
> It would also need to be done carefully as $ and ~ are valid characters
> within file names (only / and the null character are not allowed in
> Posix file systems, Windows has more characters that are not allowed).
>
>
>
>> All this being said, I think FileSystem can and should be improved,
>> but carefully.
>>
>> It would be good if more people joined this discussion.
>>
>> Sven
>
> I think I understand the original goals, and I don't disagree with them,
> however:
>
> 1. they are based on an arguably incorrect assumption (the
> separator character can be part of a file name),
>
> 2. they've caused quite a bit of confusion due to unexpected behaviour,
>
> 3. if support for a file system that allows the separator in file names
> is ever added, we will still have to be able to parse a full path
> correctly (which the current code wouldn't do)
>
> 4. I don't think these changes conflict with the original goals, all the
> examples you provided above still function exactly the same.
>
> Does that help reduce your uneasiness?
>
> Thanks again,
> Alistair
>
>
>
>
>> > On 26 Jun 2017, at 10:12, Alistair Grant <[hidden email]> wrote:
>> >
>> > Pharo's FileSystem behaviour is currently somewhat inconsistent in its
>> > treatment of path strings.
>> >
>> > To be able to demonstrate the limitations and fixes, I'm assuming that
>> > the following files and directories have been created (/dev/shm is a
>> > memory based file system in Ubuntu):
>> >
>> > cd /dev/shm
>> > mkdir -p d1/d2/d3/d4
>> > touch d1/t1.txt d1/d2/t2.txt d1/d2/d3/t3.txt d1/d2/d3/d4/t4.txt
>> > pushd d1 && ln -s d2/d3 ./s3 && popd
>> >
>> >
>> >
>> > 1. Path's are canonicalised during initial creation, but are not when
>> > extending the path, i.e. sending #/
>> >
>> > E.g.
>> >
>> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/t1.txt"
>> >
>> > '/dev/shm/d1/' asFileReference / 'd2/../t1.txt'  " File @ /dev/shm/d1/d2/../t1.txt"
>> >
>> > Automatic canonicalisation is problematic as the code doesn't handle
>> > symbolic links in the path name:
>> >
>> > ('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists  " true (correct answer)"
>> >
>> > '/dev/shm/d1/s3/../t2.txt' asFileReference exists  " false (wrong answer)"
>> >
>> >
>> > 2. In an attempt to be completely general, the argument to #/ is assumed
>> > to be a single directory / file.  This is incorrect as the path
>> > delimiter is not allowed to be part of the file name.  The result is
>> > that path comparisons and operations such as #parent give unexpected
>> > results.
>> >
>> > E.g.
>> >
>> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') exists  " true"
>> >
>> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1"
>> >
>> >
>> > My PR modified FileSystem so that:
>> >
>> > 1. Canonicalisation is separated out as an operation that has to be
>> > manually requested:
>> >
>> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/d2/../t1.txt"
>> >
>> > '/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize  " File @ /dev/shm/d1/t1.txt"
>> >
>> >
>> > 2. The argument to #/ can be either a single file / directory name or a
>> > path.
>> >
>> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1/d2/d3"
>> >
>> >
>> >
>> > 3. Adds unit tests to cover the changed behaviour.
>> >
>> >
>> > While I believe that these changes improve Pharo overall, making it more
>> > intuitive and consistent, modifying the path creation to remove
>> > canonicalisation is not backward compatible and requires a change to
>> > PathTest>>testRelativeFromStringNormalization and
>> > PathTest>>testRelativeFromStringNormalizationParent to manually
>> > canonicalise the paths.
>> >
>> > Before I submit the patch, does anyone have any strong objections to the
>> > changes as described above?
>> >
>> > Thanks,
>> > Alistair
>

Reply | Threaded
Open this post in threaded view
|

Re: FileReference>>/ and path canonicalisation

alistairgrant
In reply to this post by Stephane Ducasse-3
Hi Stef,

On Fri, Jun 30, 2017 at 09:46:44AM +0200, Stephane Ducasse wrote:
> Hi alistair
>
> I do not know if this is me that wrote a wrong path class comment.
> You should consider that theere were nearly no comment at all and I
> started to try to give more love to this great library.
>
> Stef

No problem, I wasn't trying to blame anyone.  The comments appear to
align with the design goals of the class.  And I really do agree with
the goals, it is just that given the confusion it creates, in this
particular case it is more practical to parse the strings.

Actually issue 18042[1] offers an alternative approach, which is to enforce
not allowing the directory delimiter.  I would extend the error message
to suggest using #resolve: instead.  I think that following the
programmer's rule of being strict in what you write and forgiving in
what you read, parsing the string is more practical, but I'm keen to see
what everyone else thinks.

[1] https://pharo.fogbugz.com/f/cases/18042/FileSystem-a-file-doesn-t-exist-but-still-exists

> I'm not expert of file system. Now I like your idea to have explicit
> messages such as canonalize or expand.

:-)

Cheers,
Alistair



> On Tue, Jun 27, 2017 at 8:56 PM, Alistair Grant <[hidden email]> wrote:
> > Hi Sven,
> >
> > On Tue, Jun 27, 2017 at 01:31:42PM +0200, Sven Van Caekenberghe wrote:
> >> Hi Alistair,
> >>
> >> I think it is great that you are working on FileSystem and are trying
> >> to contribute. I appreciate your effort to find consensus.
> >
> > :-)
> >
> >
> >> But, and please take this as positive criticism, I feel a bit uneasy
> >> when I read your reasoning, but maybe I am wrong.
> >
> > Thanks for taking the trouble to reply.  Hopefully I can address some of
> > your unease below.
> >
> >
> >> You see, the way I understand FileSystem (what I think was the
> >> original design idea), it is a cross platform abstraction over
> >> concrete file systems and paths.
> >
> > It does appear to be the case that effort was put in to avoid specifying
> > a separator, but it makes an assumption that is incorrect. The class
> > comments of path state that the seperator can be part of a file name,
> > and this is incorrect for every disk file system Pharo supports (as far
> > as I know).
> >
> >
> >
> >> A FileReference holds a FileSystem and a Path. A Path is just a
> >> collection of elements that leads to a location, the leaf possibly
> >> being interpreted as a file with an extension (although that last
> >> point is just a convention). It seems great effort was put into
> >> avoiding the use of separators.
> >
> >> Pharo is an object oriented system with appropriate abstractions,
> >> FileSystem gives us one approach to this difficult problem.
> >>
> >> In your reasoning, your truth, your reference is always the Unix file
> >> system and the way paths are handled there. You expect a number of
> >> things based on that, but maybe that was/is not the design goal.
> >
> > I do have a bias towards the Unix filesystem, and you're correct that
> > all the examples I provided use the Unix file system, but I've tried to
> > make sure that the changes I've made apply equally to Windows file
> > systems (more below).
> >
> >
> >> I said this before, but I am not sure we should interpret the argument of #/.
> >
> > I'll come back to this later :-)
> >
> >
> >> I think that
> >>
> >>   FileLocator root / 'foo' / 'bar' / 'readme.txt'.
> >>
> >> is more abstract (fitting to the original goal) then
> >>
> >>   '/foo' asFileReference / 'bar/readme.txt'.
> >>
> >> because it totally avoids a reference to the platform dependent path separator.
> >
> > I wasn't trying to suggest that code would normally be written this way.
> > It is more likely that '/foo' asFileReference is created early in the
> > code, and much later an input is supplied which is 'bar/readme.txt'.  At
> > that point, the natural thing to do is send #/ as above.
> >
> > The problem with only supporting the first case above is that it puts
> > the onus of parsing the string on the user.  We could add a another
> > method that parses the supplied string, but that will just make the
> > interface more confusing, and we already have a steady stream of
> > messages to the list from people getting confused by this.
> >
> >
> >> The same goes for '..' as
> >>
> >>   (FileLocator root / 'foo' / 'bar' / 'down') parent / 'readme.txt'.
> >>
> >> is a more object oriented way to write '/foo/bar/down/../readme.txt', IMO.
> >
> > Agreed, but as mentioned above, we also have to deal with input strings
> > from external sources.
> >
> >
> >> Windows is an important target for Pharo, they use $\ not $/.
> >
> > The patch handles that, so on windows I can do:
> >
> > ('C:\cygwin' asFileReference / 'usr\local\bin') parent  " File @ C:\cygwin\usr\local"
> >
> > (just to be clear, this is an example of how the strings are handled,
> > I'm not expecting anyone to write code like the above).
> >
> >
> >> Do you see my point ?
> >
> > I think so, but... :-)
> >
> >
> >> BTW, I consider the fact that current,
> >>
> >>  '/foo' asFileReference / 'bar/readme.txt'
> >>
> >> works with #exists even though 'bar/readme.txt' was not fully
> >> parsed/resolved a bug, or a happy coincidence at best.
> >
> > I agree that it is a happy coincedence, but it is also intuitive.
> >
> >
> >> Another point is the distinction between internal/external and/or
> >> concrete/abstract paths. Should something like '..' remain part of a
> >> path. Never, always, only when it can be resolved ? What about special
> >> characters ?
> >
> > I'm arguing that '..' should be left in the path unless explicitly told
> > to be removed, so that things like symbolic links are handled properly
> > (by the file system itself).  (Thanks to Denis for reminding me about
> > this in an earlier email thread)
> >
> > While it's nice to be completely general, is there a supported file
> > system that doesn't interpret ".." as the parent directory?
> >
> > As an aside, another patch I'll be submitting fixes symbolic link
> > handling so we could conveivably write a #canonicalizeOnFileSystem that
> > would properly handle 'symboliclink/..', but it would be slow.
> >
> >
> >> Should we support ~ ? Sometimes I would like that too, but maybe
> >>
> >>   FileLocator home
> >>
> >> is enough.
> >
> > As Subbu points out in a later message, things like '~' are handled by
> > the shell.  We could consider adding something like #expand to handle
> > '~' and shell variables, e.g. $HOME, but I'd make that a separate patch.
> >
> > It would also need to be done carefully as $ and ~ are valid characters
> > within file names (only / and the null character are not allowed in
> > Posix file systems, Windows has more characters that are not allowed).
> >
> >
> >
> >> All this being said, I think FileSystem can and should be improved,
> >> but carefully.
> >>
> >> It would be good if more people joined this discussion.
> >>
> >> Sven
> >
> > I think I understand the original goals, and I don't disagree with them,
> > however:
> >
> > 1. they are based on an arguably incorrect assumption (the
> > separator character can be part of a file name),
> >
> > 2. they've caused quite a bit of confusion due to unexpected behaviour,
> >
> > 3. if support for a file system that allows the separator in file names
> > is ever added, we will still have to be able to parse a full path
> > correctly (which the current code wouldn't do)
> >
> > 4. I don't think these changes conflict with the original goals, all the
> > examples you provided above still function exactly the same.
> >
> > Does that help reduce your uneasiness?
> >
> > Thanks again,
> > Alistair
> >
> >
> >
> >
> >> > On 26 Jun 2017, at 10:12, Alistair Grant <[hidden email]> wrote:
> >> >
> >> > Pharo's FileSystem behaviour is currently somewhat inconsistent in its
> >> > treatment of path strings.
> >> >
> >> > To be able to demonstrate the limitations and fixes, I'm assuming that
> >> > the following files and directories have been created (/dev/shm is a
> >> > memory based file system in Ubuntu):
> >> >
> >> > cd /dev/shm
> >> > mkdir -p d1/d2/d3/d4
> >> > touch d1/t1.txt d1/d2/t2.txt d1/d2/d3/t3.txt d1/d2/d3/d4/t4.txt
> >> > pushd d1 && ln -s d2/d3 ./s3 && popd
> >> >
> >> >
> >> >
> >> > 1. Path's are canonicalised during initial creation, but are not when
> >> > extending the path, i.e. sending #/
> >> >
> >> > E.g.
> >> >
> >> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/t1.txt"
> >> >
> >> > '/dev/shm/d1/' asFileReference / 'd2/../t1.txt'  " File @ /dev/shm/d1/d2/../t1.txt"
> >> >
> >> > Automatic canonicalisation is problematic as the code doesn't handle
> >> > symbolic links in the path name:
> >> >
> >> > ('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists  " true (correct answer)"
> >> >
> >> > '/dev/shm/d1/s3/../t2.txt' asFileReference exists  " false (wrong answer)"
> >> >
> >> >
> >> > 2. In an attempt to be completely general, the argument to #/ is assumed
> >> > to be a single directory / file.  This is incorrect as the path
> >> > delimiter is not allowed to be part of the file name.  The result is
> >> > that path comparisons and operations such as #parent give unexpected
> >> > results.
> >> >
> >> > E.g.
> >> >
> >> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') exists  " true"
> >> >
> >> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1"
> >> >
> >> >
> >> > My PR modified FileSystem so that:
> >> >
> >> > 1. Canonicalisation is separated out as an operation that has to be
> >> > manually requested:
> >> >
> >> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/d2/../t1.txt"
> >> >
> >> > '/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize  " File @ /dev/shm/d1/t1.txt"
> >> >
> >> >
> >> > 2. The argument to #/ can be either a single file / directory name or a
> >> > path.
> >> >
> >> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1/d2/d3"
> >> >
> >> >
> >> >
> >> > 3. Adds unit tests to cover the changed behaviour.
> >> >
> >> >
> >> > While I believe that these changes improve Pharo overall, making it more
> >> > intuitive and consistent, modifying the path creation to remove
> >> > canonicalisation is not backward compatible and requires a change to
> >> > PathTest>>testRelativeFromStringNormalization and
> >> > PathTest>>testRelativeFromStringNormalizationParent to manually
> >> > canonicalise the paths.
> >> >
> >> > Before I submit the patch, does anyone have any strong objections to the
> >> > changes as described above?
> >> >
> >> > Thanks,
> >> > Alistair
> >
>

Reply | Threaded
Open this post in threaded view
|

Re: FileReference>>/ and path canonicalisation

Stephane Ducasse-3
I did not take any blame in wht you say. I just wanted to explain that
yes comments can be wrong :)

I'm super happy to get some brains looking at improving FS.

I'm fighting to see how we can start to integrate faster changes in P70.

On Fri, Jun 30, 2017 at 10:58 AM, Alistair Grant <[hidden email]> wrote:

> Hi Stef,
>
> On Fri, Jun 30, 2017 at 09:46:44AM +0200, Stephane Ducasse wrote:
>> Hi alistair
>>
>> I do not know if this is me that wrote a wrong path class comment.
>> You should consider that theere were nearly no comment at all and I
>> started to try to give more love to this great library.
>>
>> Stef
>
> No problem, I wasn't trying to blame anyone.  The comments appear to
> align with the design goals of the class.  And I really do agree with
> the goals, it is just that given the confusion it creates, in this
> particular case it is more practical to parse the strings.
>
> Actually issue 18042[1] offers an alternative approach, which is to enforce
> not allowing the directory delimiter.  I would extend the error message
> to suggest using #resolve: instead.  I think that following the
> programmer's rule of being strict in what you write and forgiving in
> what you read, parsing the string is more practical, but I'm keen to see
> what everyone else thinks.
>
> [1] https://pharo.fogbugz.com/f/cases/18042/FileSystem-a-file-doesn-t-exist-but-still-exists
>
>> I'm not expert of file system. Now I like your idea to have explicit
>> messages such as canonalize or expand.
>
> :-)
>
> Cheers,
> Alistair
>
>
>
>> On Tue, Jun 27, 2017 at 8:56 PM, Alistair Grant <[hidden email]> wrote:
>> > Hi Sven,
>> >
>> > On Tue, Jun 27, 2017 at 01:31:42PM +0200, Sven Van Caekenberghe wrote:
>> >> Hi Alistair,
>> >>
>> >> I think it is great that you are working on FileSystem and are trying
>> >> to contribute. I appreciate your effort to find consensus.
>> >
>> > :-)
>> >
>> >
>> >> But, and please take this as positive criticism, I feel a bit uneasy
>> >> when I read your reasoning, but maybe I am wrong.
>> >
>> > Thanks for taking the trouble to reply.  Hopefully I can address some of
>> > your unease below.
>> >
>> >
>> >> You see, the way I understand FileSystem (what I think was the
>> >> original design idea), it is a cross platform abstraction over
>> >> concrete file systems and paths.
>> >
>> > It does appear to be the case that effort was put in to avoid specifying
>> > a separator, but it makes an assumption that is incorrect. The class
>> > comments of path state that the seperator can be part of a file name,
>> > and this is incorrect for every disk file system Pharo supports (as far
>> > as I know).
>> >
>> >
>> >
>> >> A FileReference holds a FileSystem and a Path. A Path is just a
>> >> collection of elements that leads to a location, the leaf possibly
>> >> being interpreted as a file with an extension (although that last
>> >> point is just a convention). It seems great effort was put into
>> >> avoiding the use of separators.
>> >
>> >> Pharo is an object oriented system with appropriate abstractions,
>> >> FileSystem gives us one approach to this difficult problem.
>> >>
>> >> In your reasoning, your truth, your reference is always the Unix file
>> >> system and the way paths are handled there. You expect a number of
>> >> things based on that, but maybe that was/is not the design goal.
>> >
>> > I do have a bias towards the Unix filesystem, and you're correct that
>> > all the examples I provided use the Unix file system, but I've tried to
>> > make sure that the changes I've made apply equally to Windows file
>> > systems (more below).
>> >
>> >
>> >> I said this before, but I am not sure we should interpret the argument of #/.
>> >
>> > I'll come back to this later :-)
>> >
>> >
>> >> I think that
>> >>
>> >>   FileLocator root / 'foo' / 'bar' / 'readme.txt'.
>> >>
>> >> is more abstract (fitting to the original goal) then
>> >>
>> >>   '/foo' asFileReference / 'bar/readme.txt'.
>> >>
>> >> because it totally avoids a reference to the platform dependent path separator.
>> >
>> > I wasn't trying to suggest that code would normally be written this way.
>> > It is more likely that '/foo' asFileReference is created early in the
>> > code, and much later an input is supplied which is 'bar/readme.txt'.  At
>> > that point, the natural thing to do is send #/ as above.
>> >
>> > The problem with only supporting the first case above is that it puts
>> > the onus of parsing the string on the user.  We could add a another
>> > method that parses the supplied string, but that will just make the
>> > interface more confusing, and we already have a steady stream of
>> > messages to the list from people getting confused by this.
>> >
>> >
>> >> The same goes for '..' as
>> >>
>> >>   (FileLocator root / 'foo' / 'bar' / 'down') parent / 'readme.txt'.
>> >>
>> >> is a more object oriented way to write '/foo/bar/down/../readme.txt', IMO.
>> >
>> > Agreed, but as mentioned above, we also have to deal with input strings
>> > from external sources.
>> >
>> >
>> >> Windows is an important target for Pharo, they use $\ not $/.
>> >
>> > The patch handles that, so on windows I can do:
>> >
>> > ('C:\cygwin' asFileReference / 'usr\local\bin') parent  " File @ C:\cygwin\usr\local"
>> >
>> > (just to be clear, this is an example of how the strings are handled,
>> > I'm not expecting anyone to write code like the above).
>> >
>> >
>> >> Do you see my point ?
>> >
>> > I think so, but... :-)
>> >
>> >
>> >> BTW, I consider the fact that current,
>> >>
>> >>  '/foo' asFileReference / 'bar/readme.txt'
>> >>
>> >> works with #exists even though 'bar/readme.txt' was not fully
>> >> parsed/resolved a bug, or a happy coincidence at best.
>> >
>> > I agree that it is a happy coincedence, but it is also intuitive.
>> >
>> >
>> >> Another point is the distinction between internal/external and/or
>> >> concrete/abstract paths. Should something like '..' remain part of a
>> >> path. Never, always, only when it can be resolved ? What about special
>> >> characters ?
>> >
>> > I'm arguing that '..' should be left in the path unless explicitly told
>> > to be removed, so that things like symbolic links are handled properly
>> > (by the file system itself).  (Thanks to Denis for reminding me about
>> > this in an earlier email thread)
>> >
>> > While it's nice to be completely general, is there a supported file
>> > system that doesn't interpret ".." as the parent directory?
>> >
>> > As an aside, another patch I'll be submitting fixes symbolic link
>> > handling so we could conveivably write a #canonicalizeOnFileSystem that
>> > would properly handle 'symboliclink/..', but it would be slow.
>> >
>> >
>> >> Should we support ~ ? Sometimes I would like that too, but maybe
>> >>
>> >>   FileLocator home
>> >>
>> >> is enough.
>> >
>> > As Subbu points out in a later message, things like '~' are handled by
>> > the shell.  We could consider adding something like #expand to handle
>> > '~' and shell variables, e.g. $HOME, but I'd make that a separate patch.
>> >
>> > It would also need to be done carefully as $ and ~ are valid characters
>> > within file names (only / and the null character are not allowed in
>> > Posix file systems, Windows has more characters that are not allowed).
>> >
>> >
>> >
>> >> All this being said, I think FileSystem can and should be improved,
>> >> but carefully.
>> >>
>> >> It would be good if more people joined this discussion.
>> >>
>> >> Sven
>> >
>> > I think I understand the original goals, and I don't disagree with them,
>> > however:
>> >
>> > 1. they are based on an arguably incorrect assumption (the
>> > separator character can be part of a file name),
>> >
>> > 2. they've caused quite a bit of confusion due to unexpected behaviour,
>> >
>> > 3. if support for a file system that allows the separator in file names
>> > is ever added, we will still have to be able to parse a full path
>> > correctly (which the current code wouldn't do)
>> >
>> > 4. I don't think these changes conflict with the original goals, all the
>> > examples you provided above still function exactly the same.
>> >
>> > Does that help reduce your uneasiness?
>> >
>> > Thanks again,
>> > Alistair
>> >
>> >
>> >
>> >
>> >> > On 26 Jun 2017, at 10:12, Alistair Grant <[hidden email]> wrote:
>> >> >
>> >> > Pharo's FileSystem behaviour is currently somewhat inconsistent in its
>> >> > treatment of path strings.
>> >> >
>> >> > To be able to demonstrate the limitations and fixes, I'm assuming that
>> >> > the following files and directories have been created (/dev/shm is a
>> >> > memory based file system in Ubuntu):
>> >> >
>> >> > cd /dev/shm
>> >> > mkdir -p d1/d2/d3/d4
>> >> > touch d1/t1.txt d1/d2/t2.txt d1/d2/d3/t3.txt d1/d2/d3/d4/t4.txt
>> >> > pushd d1 && ln -s d2/d3 ./s3 && popd
>> >> >
>> >> >
>> >> >
>> >> > 1. Path's are canonicalised during initial creation, but are not when
>> >> > extending the path, i.e. sending #/
>> >> >
>> >> > E.g.
>> >> >
>> >> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/t1.txt"
>> >> >
>> >> > '/dev/shm/d1/' asFileReference / 'd2/../t1.txt'  " File @ /dev/shm/d1/d2/../t1.txt"
>> >> >
>> >> > Automatic canonicalisation is problematic as the code doesn't handle
>> >> > symbolic links in the path name:
>> >> >
>> >> > ('/dev/shm/d1/' asFileReference / 's3/../t2.txt') exists  " true (correct answer)"
>> >> >
>> >> > '/dev/shm/d1/s3/../t2.txt' asFileReference exists  " false (wrong answer)"
>> >> >
>> >> >
>> >> > 2. In an attempt to be completely general, the argument to #/ is assumed
>> >> > to be a single directory / file.  This is incorrect as the path
>> >> > delimiter is not allowed to be part of the file name.  The result is
>> >> > that path comparisons and operations such as #parent give unexpected
>> >> > results.
>> >> >
>> >> > E.g.
>> >> >
>> >> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') exists  " true"
>> >> >
>> >> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1"
>> >> >
>> >> >
>> >> > My PR modified FileSystem so that:
>> >> >
>> >> > 1. Canonicalisation is separated out as an operation that has to be
>> >> > manually requested:
>> >> >
>> >> > '/dev/shm/d1/d2/../t1.txt' asFileReference  " File @ /dev/shm/d1/d2/../t1.txt"
>> >> >
>> >> > '/dev/shm/d1/d2/../t1.txt' asFileReference canonicalize  " File @ /dev/shm/d1/t1.txt"
>> >> >
>> >> >
>> >> > 2. The argument to #/ can be either a single file / directory name or a
>> >> > path.
>> >> >
>> >> > ('/dev/shm/d1/' asFileReference / 'd2/d3/t3.txt') parent  " File @ /dev/shm/d1/d2/d3"
>> >> >
>> >> >
>> >> >
>> >> > 3. Adds unit tests to cover the changed behaviour.
>> >> >
>> >> >
>> >> > While I believe that these changes improve Pharo overall, making it more
>> >> > intuitive and consistent, modifying the path creation to remove
>> >> > canonicalisation is not backward compatible and requires a change to
>> >> > PathTest>>testRelativeFromStringNormalization and
>> >> > PathTest>>testRelativeFromStringNormalizationParent to manually
>> >> > canonicalise the paths.
>> >> >
>> >> > Before I submit the patch, does anyone have any strong objections to the
>> >> > changes as described above?
>> >> >
>> >> > Thanks,
>> >> > Alistair
>> >
>>
>