Need more consistent #lines

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Need more consistent #lines

Igor Stasenko
A total number of lines in string should be:
one + number of line separators found.

Explanation: a 'String cr/ crlf' stands for 'line separator'.
By definition, a separator, is thing which separates two others.. i
mean, there is no way how you can use 'separator' term
without having TWO things which it separates.

The problem is that #lines does not follows common sense:

(String empty, String cr, String empty) lines size 1

so, our line separator "separates" one thing :)
It means that we overload the definition:
 - a line separator separates two lines unless it is at the end of text.

But think , how dearly this 'unless' will costs in terms of implementation!
Everywhere where you may need to handle text, you should put this
extra rule, and if you forget it,
you will be punished..


Here what various #lines produces:

( 'A' ) lines size 1
( 'A', String cr ) lines size 1   !!oops!!.

( String cr,  'A' ) lines size 2
( 'A', String cr,  'A' ) lines size 2

( 'A', String cr,  'A', String cr ) lines size 2
( 'A', String cr,  'A', String cr ) lines size 2   !!oops!!

You may think this is fine, but lets look at the problem from another angle:

self assert: (string1, String cr, string2) lineCount >= 2

The 'string1, String cr, string2' above is an insertion operation.
We inserting a line separator between two arbitrary strings.
Now, in what universe the total number of lines may not get increased
after such insertion?

So, it is simply inconsistent. Or maybe you happy to put extra code
everywhere where you handling text, to do a special handling if last
character is line separator?
But we can avoid all that mess it in a first place, if we obey to common sense.

--
Best regards,
Igor Stasenko.

Reply | Threaded
Open this post in threaded view
|

Re: Need more consistent #lines

Stéphane Ducasse
http://code.google.com/p/pharo/issues/detail?id=6807

Stef

> A total number of lines in string should be:
> one + number of line separators found.
>
> Explanation: a 'String cr/ crlf' stands for 'line separator'.
> By definition, a separator, is thing which separates two others.. i
> mean, there is no way how you can use 'separator' term
> without having TWO things which it separates.
>
> The problem is that #lines does not follows common sense:
>
> (String empty, String cr, String empty) lines size 1
>
> so, our line separator "separates" one thing :)
> It means that we overload the definition:
> - a line separator separates two lines unless it is at the end of text.
>
> But think , how dearly this 'unless' will costs in terms of implementation!
> Everywhere where you may need to handle text, you should put this
> extra rule, and if you forget it,
> you will be punished..
>
>
> Here what various #lines produces:
>
> ( 'A' ) lines size 1
> ( 'A', String cr ) lines size 1   !!oops!!.
>
> ( String cr,  'A' ) lines size 2
> ( 'A', String cr,  'A' ) lines size 2
>
> ( 'A', String cr,  'A', String cr ) lines size 2
> ( 'A', String cr,  'A', String cr ) lines size 2   !!oops!!
>
> You may think this is fine, but lets look at the problem from another angle:
>
> self assert: (string1, String cr, string2) lineCount >= 2
>
> The 'string1, String cr, string2' above is an insertion operation.
> We inserting a line separator between two arbitrary strings.
> Now, in what universe the total number of lines may not get increased
> after such insertion?
>
> So, it is simply inconsistent. Or maybe you happy to put extra code
> everywhere where you handling text, to do a special handling if last
> character is line separator?
> But we can avoid all that mess it in a first place, if we obey to common sense.
>
> --
> Best regards,
> Igor Stasenko.
>


Reply | Threaded
Open this post in threaded view
|

Re: Need more consistent #lines

Levente Uzonyi-2
In reply to this post by Igor Stasenko
On Sun, 14 Oct 2012, Igor Stasenko wrote:

> A total number of lines in string should be:
> one + number of line separators found.
>
> Explanation: a 'String cr/ crlf' stands for 'line separator'.
> By definition, a separator, is thing which separates two others.. i
> mean, there is no way how you can use 'separator' term
> without having TWO things which it separates.
>
> The problem is that #lines does not follows common sense:
>
> (String empty, String cr, String empty) lines size 1
>
> so, our line separator "separates" one thing :)
> It means that we overload the definition:
> - a line separator separates two lines unless it is at the end of text.
>
> But think , how dearly this 'unless' will costs in terms of implementation!
> Everywhere where you may need to handle text, you should put this
> extra rule, and if you forget it,
> you will be punished..
>
>
> Here what various #lines produces:
>
> ( 'A' ) lines size 1
> ( 'A', String cr ) lines size 1   !!oops!!.
>
> ( String cr,  'A' ) lines size 2
> ( 'A', String cr,  'A' ) lines size 2
>
> ( 'A', String cr,  'A', String cr ) lines size 2
> ( 'A', String cr,  'A', String cr ) lines size 2   !!oops!!

The above two examples are the same. ;)

>
> You may think this is fine, but lets look at the problem from another angle:
>
> self assert: (string1, String cr, string2) lineCount >= 2
>
> The 'string1, String cr, string2' above is an insertion operation.
> We inserting a line separator between two arbitrary strings.
> Now, in what universe the total number of lines may not get increased
> after such insertion?
>
> So, it is simply inconsistent. Or maybe you happy to put extra code
> everywhere where you handling text, to do a special handling if last
> character is line separator?
> But we can avoid all that mess it in a first place, if we obey to common sense.

Read this before you do anything (wrong):

"There is also some confusion whether newlines terminate or separate
lines. If a newline is considered a separator, there will be no newline
after the last line of a file. The general convention on most systems is
to add a newline even after the last line, i.e. to treat newline as a
line terminator.[citation needed] Some programs have problems processing
the last line of a file if it is not newline terminated. Conversely,
programs that expect newline to be used as a separator will interpret a
final newline as starting a new (empty) line."

From: http://en.wikipedia.org/wiki/Newline


Levente


>
> --
> Best regards,
> Igor Stasenko.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Need more consistent #lines

Nicolas Cellier
In reply to this post by Stéphane Ducasse
Yes, that's weird.
But I think the inconsistency comes from Unix that insist on having a
terminating LF.

(FileDirectory default fileNamed: 'foo') truncate; nextPut: $a; close.

$ od -c foo
0000000    a
0000001

$ wc -l foo
       0 foo

(FileDirectory default fileNamed: 'foo') truncate; nextPut: $a; lf; close.

$ od -c foo
0000000    a  \n
0000002

$ wc -l foo
       1 foo

In the past I had to care of appending a terminal line ending
character to make sure some unix scripts/tools would work.
For example remember that not having a terminal line ending in C
source can lead to a warning or error cdepending on your
compiler/flags...
Unfortunately, we have to admit that common sense in case of line
ending has been a bit perverted by our dear OS providers...

Nicolas



2012/10/14 Stéphane Ducasse <[hidden email]>:

> http://code.google.com/p/pharo/issues/detail?id=6807
>
> Stef
>
>> A total number of lines in string should be:
>> one + number of line separators found.
>>
>> Explanation: a 'String cr/ crlf' stands for 'line separator'.
>> By definition, a separator, is thing which separates two others.. i
>> mean, there is no way how you can use 'separator' term
>> without having TWO things which it separates.
>>
>> The problem is that #lines does not follows common sense:
>>
>> (String empty, String cr, String empty) lines size 1
>>
>> so, our line separator "separates" one thing :)
>> It means that we overload the definition:
>> - a line separator separates two lines unless it is at the end of text.
>>
>> But think , how dearly this 'unless' will costs in terms of implementation!
>> Everywhere where you may need to handle text, you should put this
>> extra rule, and if you forget it,
>> you will be punished..
>>
>>
>> Here what various #lines produces:
>>
>> ( 'A' ) lines size 1
>> ( 'A', String cr ) lines size 1   !!oops!!.
>>
>> ( String cr,  'A' ) lines size 2
>> ( 'A', String cr,  'A' ) lines size 2
>>
>> ( 'A', String cr,  'A', String cr ) lines size 2
>> ( 'A', String cr,  'A', String cr ) lines size 2   !!oops!!
>>
>> You may think this is fine, but lets look at the problem from another angle:
>>
>> self assert: (string1, String cr, string2) lineCount >= 2
>>
>> The 'string1, String cr, string2' above is an insertion operation.
>> We inserting a line separator between two arbitrary strings.
>> Now, in what universe the total number of lines may not get increased
>> after such insertion?
>>
>> So, it is simply inconsistent. Or maybe you happy to put extra code
>> everywhere where you handling text, to do a special handling if last
>> character is line separator?
>> But we can avoid all that mess it in a first place, if we obey to common sense.
>>
>> --
>> Best regards,
>> Igor Stasenko.
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Need more consistent #lines

David T. Lewis
On Sun, Oct 14, 2012 at 02:23:43PM +0200, Nicolas Cellier wrote:
> Yes, that's weird.
> But I think the inconsistency comes from Unix that insist on having a
> terminating LF.

Not so. The line end convention on Unix is consistent with what Igor is
expecting to see. But it's just a convention that is generally adopted by
programs written for Unix, that's all. The operating system does nothing
to enforce it.

>
> (FileDirectory default fileNamed: 'foo') truncate; nextPut: $a; close.
>
> $ od -c foo
> 0000000    a
> 0000001
>
> $ wc -l foo
>        0 foo
>
> (FileDirectory default fileNamed: 'foo') truncate; nextPut: $a; lf; close.
>
> $ od -c foo
> 0000000    a  \n
> 0000002
>
> $ wc -l foo
>        1 foo
>
> In the past I had to care of appending a terminal line ending
> character to make sure some unix scripts/tools would work.
> For example remember that not having a terminal line ending in C
> source can lead to a warning or error cdepending on your
> compiler/flags...
> Unfortunately, we have to admit that common sense in case of line
> ending has been a bit perverted by our dear OS providers...

Record-oriented file systems are quite common, and are still widely
used. If you have an operating system (such as Unix or Windows) without
record-oriented files, then common sense would tell you to adopt a
line ending convention and use it consistently in all programs that
need to read "records". Given that records in a file might be empty
or blank, it would also be good common sense to ensure that an empty
record at the end of a file should not go missing just because it
is at the end of the file.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Need more consistent #lines

Igor Stasenko
In reply to this post by Levente Uzonyi-2
On 14 October 2012 13:56, Levente Uzonyi <[hidden email]> wrote:

> On Sun, 14 Oct 2012, Igor Stasenko wrote:
>
>> A total number of lines in string should be:
>> one + number of line separators found.
>>
>> Explanation: a 'String cr/ crlf' stands for 'line separator'.
>> By definition, a separator, is thing which separates two others.. i
>> mean, there is no way how you can use 'separator' term
>> without having TWO things which it separates.
>>
>> The problem is that #lines does not follows common sense:
>>
>> (String empty, String cr, String empty) lines size 1
>>
>> so, our line separator "separates" one thing :)
>> It means that we overload the definition:
>> - a line separator separates two lines unless it is at the end of text.
>>
>> But think , how dearly this 'unless' will costs in terms of
>> implementation!
>> Everywhere where you may need to handle text, you should put this
>> extra rule, and if you forget it,
>> you will be punished..
>>
>>
>> Here what various #lines produces:
>>
>> ( 'A' ) lines size 1
>> ( 'A', String cr ) lines size 1   !!oops!!.
>>
>> ( String cr,  'A' ) lines size 2
>> ( 'A', String cr,  'A' ) lines size 2
>>
>> ( 'A', String cr,  'A', String cr ) lines size 2
>> ( 'A', String cr,  'A', String cr ) lines size 2   !!oops!!
>
>
> The above two examples are the same. ;)
>
oh,sorry, i meant
( 'A', String cr,  'A' ) lines size 2
( 'A', String cr, , 'A', String cr ) lines size 2   !!oops!!

>
>>
>> You may think this is fine, but lets look at the problem from another
>> angle:
>>
>> self assert: (string1, String cr, string2) lineCount >= 2
>>
>> The 'string1, String cr, string2' above is an insertion operation.
>> We inserting a line separator between two arbitrary strings.
>> Now, in what universe the total number of lines may not get increased
>> after such insertion?
>>
>> So, it is simply inconsistent. Or maybe you happy to put extra code
>> everywhere where you handling text, to do a special handling if last
>> character is line separator?
>> But we can avoid all that mess it in a first place, if we obey to common
>> sense.
>
>
> Read this before you do anything (wrong):
>
> "There is also some confusion whether newlines terminate or separate lines.
> If a newline is considered a separator, there will be no newline after the
> last line of a file. The general convention on most systems is to add a
> newline even after the last line, i.e. to treat newline as a line
> terminator.[citation needed] Some programs have problems processing the last
> line of a file if it is not newline terminated. Conversely, programs that
> expect newline to be used as a separator will interpret a final newline as
> starting a new (empty) line."
>

I am more concerned about out little world in smalltalk. For instance,
how we count number of lines
in text editor.
Let me demonstrate what way counting the number of lines is consistent
with editor:

when user opens a fresh editor with empty contents, he has a single
empty line for editing:

1: ''

now he can start typing:

1: 'abc'

if you save this string into file, and load it back, you obviously
should show same contents:

1: 'abc'

so, even if there's no 'newline' at the end it is still 1 line of text.
If you won't keep same perception between what you showing to user and
how tools perceive it,
you will get inconsistency (with all grave consequences, and numerous
workarounds everywhere etc etc )

 'wc -l' shows you 0, while editor shows 1, so there is no consistency.

If we favor user's experience, then in such perception, the real liar
is wc, not text editor.


now, lets assume that user continues editing that text and hits
"enter". No matter what character(s) you will insert into the string
contents (lf/cr/crlf whatever), you should now show TWO lines
available for editing:

1: 'abc'
2: ''

So, again, it is not a question, that most consistent way of treating
cr/crlf/lf is to be a line separator,
but not 'line terminating character(s)' or 'new line, but not always' bullshit.

Now, you can say, that editor don't needs to be pedantic about it, and
start not from empty string
but from
'' , String cr
contents.

But then users will lose ability to produce files without terminating
'cr', as well as empty files.
(unless, of course you put extra code to remove last 'cr' when filing
out, and add it back when filing in)..
but once you start doing something like that, it leads exactly to what
i want to prevent in a first place:
extra code/logic to deal with inconsistency.

--
Best regards,
Igor Stasenko.