convenience methods for encoding external process calls

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

convenience methods for encoding external process calls

Alan Knight-2
One of the things that changed in 7.7 was that forking an external process on Windows via shOne: started consistently using Unicode (i.e. UTF-16) and the /u shell argument when doing it. Not all programs respect this, so while it made some things start giving back correct results for the first time, other things started giving back badly encoded results. This has caused a reasonable amount of confusion.

It's pretty easy to interpret the process results with any encoding you want, just by doing something like
   p := ExternalProcess new.
   p encoding: #JIS.
   ^p fork: 'someProgram' arguments: (Array with: '123').

but people would need to know that, where they've been used to just using shOne:.  So it seems like it would be a good idea to provide a convenience API that does what people are used to. But the question is what to call it. So, for example

WinProcess>>shOneOEM: aString
    self encoding: (OSSystemSupport concreteClass new GetOEMCP printString asSymbol).
   ^self fork: self getCommandLineInterpreter arguments: (Array with: '/c' with: aString).

which would run the command using the OEM encoding, which is the most likely thing programs are going to use if they're not using UTF-16 like good citizens. On North American Windows, that means code page 437.

But if people aren't aware of the distinction, they're probably not going to know that OEM is the right term. Other suggestions included:

shOne8Bit:
shOneNonUnicode:
or even just
cmd:

Any preferences, or better ideas?  I thought it might also be worthwhile to provide shOneBinary:, which is just the same thing except it sets the encoding to #binary, but people might not be aware they can do that. Is that worthwhile, or just clutter?


--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: convenience methods for encoding external process calls

Eliot Miranda-2
Why not shOne:encoding: implementing shOne: in terms of it and the default encoding?  Then you've got two methods, not N.

On Thu, Apr 22, 2010 at 1:59 PM, Alan Knight <[hidden email]> wrote:
One of the things that changed in 7.7 was that forking an external process on Windows via shOne: started consistently using Unicode (i.e. UTF-16) and the /u shell argument when doing it. Not all programs respect this, so while it made some things start giving back correct results for the first time, other things started giving back badly encoded results. This has caused a reasonable amount of confusion.

It's pretty easy to interpret the process results with any encoding you want, just by doing something like
   p := ExternalProcess new.
   p encoding: #JIS.
   ^p fork: 'someProgram' arguments: (Array with: '123').

but people would need to know that, where they've been used to just using shOne:.  So it seems like it would be a good idea to provide a convenience API that does what people are used to. But the question is what to call it. So, for example

WinProcess>>shOneOEM: aString
    self encoding: (OSSystemSupport concreteClass new GetOEMCP printString asSymbol).
   ^self fork: self getCommandLineInterpreter arguments: (Array with: '/c' with: aString).

which would run the command using the OEM encoding, which is the most likely thing programs are going to use if they're not using UTF-16 like good citizens. On North American Windows, that means code page 437.

But if people aren't aware of the distinction, they're probably not going to know that OEM is the right term. Other suggestions included:

shOne8Bit:
shOneNonUnicode:
or even just
cmd:

Any preferences, or better ideas?  I thought it might also be worthwhile to provide shOneBinary:, which is just the same thing except it sets the encoding to #binary, but people might not be aware they can do that. Is that worthwhile, or just clutter?


--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc



_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: convenience methods for encoding external process calls

Alan Knight-2
Well, there isn't a single default encoding, but two major possibilities. Having shOne: work in terms of the default encoding is effectively what we do now, and it clearly confused people.

Also, there's a command line argument to the shell for the programs which do honour its requests, to tell it to use UTF-16 or not. So if you're not, then you should pass it something different. So either the syntax gets more complicated again, or the implementation starts needing a case statement on the encoding and adding/removing arguments, which seems uglier.

Finally, the most laborious part is getting the encoding, so the convenience method is less convenient if you have to write
         OSSystemSupport concreteClass new GetOEMCP printString asSymbol
for one of its arguments each time.

At 05:12 PM 2010-04-22, Eliot Miranda wrote:
Why not shOne:encoding: implementing shOne: in terms of it and the default encoding?  Then you've got two methods, not N.

On Thu, Apr 22, 2010 at 1:59 PM, Alan Knight <[hidden email]> wrote:
One of the things that changed in 7.7 was that forking an external process on Windows via shOne: started consistently using Unicode (i.e. UTF-16) and the /u shell argument when doing it. Not all programs respect this, so while it made some things start giving back correct results for the first time, other things started giving back badly encoded results. This has caused a reasonable amount of confusion.

It's pretty easy to interpret the process results with any encoding you want, just by doing something like
   p := ExternalProcess new.
   p encoding: #JIS.
   ^p fork: 'someProgram' arguments: (Array with: '123').

but people would need to know that, where they've been used to just using shOne:.  So it seems like it would be a good idea to provide a convenience API that does what people are used to. But the question is what to call it. So, for example

WinProcess>>shOneOEM: aString
    self encoding: (OSSystemSupport concreteClass new GetOEMCP printString asSymbol).
   ^self fork: self getCommandLineInterpreter arguments: (Array with: '/c' with: aString).

which would run the command using the OEM encoding, which is the most likely thing programs are going to use if they're not using UTF-16 like good citizens. On North American Windows, that means code page 437.

But if people aren't aware of the distinction, they're probably not going to know that OEM is the right term. Other suggestions included:

shOne8Bit:
shOneNonUnicode:
or even just
cmd:

Any preferences, or better ideas?  I thought it might also be worthwhile to provide shOneBinary:, which is just the same thing except it sets the encoding to #binary, but people might not be aware they can do that. Is that worthwhile, or just clutter?


--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: convenience methods for encoding external process calls

Holger Kleinsorgen-4
In reply to this post by Alan Knight-2
the selector / class names of ExternalProcess have always confused me:

- #cshOne: - as the comment already says, it's Unix-specific. And what's
the meaning of the suffix "One"? My first thought was "channel one aka
stdout(1)", but it returns the output of channel stderr(2) if there is
any output.

- #fork: - it waits for the command to finish, so calling the method
"fork" is misleading. Forking is actually done in #startProcess:arguments:

- #shOne: - same issues as #cshOne:. "sh" is a Unix-specific name, too.

- ExternalProcess is command-line-centric

     WinProcess new startProcess: 'notepad' arguments: #()
     Win32SystemSupport CreateProcess: nil arguments: 'notepad'

suggestions for new selectors:

1. replacement for fork:arguments:

   runExecutable:arguments:
     or
   runCommandLineExecutable:arguments:
     or
   run:arguments:

2. replacement for shOne:

   runCommandLine:

3. additional method to solve the encoding issue:

   runCommandLine: aString forceByteEncoding: aBoolean

The implementation of this method could use two internal methods (e.g.
#runCommandLineWithDefaultEncoding: and
#runCommandLineWithByteEncoding:), or use some switches, or whatever ;)

the important part is that the selector does not
- mention any specific encoding
- use abbreviations like "sh"

Am 22.04.2010 22:59, schrieb Alan Knight:

> One of the things that changed in 7.7 was that forking an external
> process on Windows via shOne: started consistently using Unicode (i.e.
> UTF-16) and the /u shell argument when doing it. Not all programs
> respect this, so while it made some things start giving back correct
> results for the first time, other things started giving back badly
> encoded results. This has caused a reasonable amount of confusion.
>
> It's pretty easy to interpret the process results with any encoding you
> want, just by doing something like
> p := ExternalProcess new.
> p encoding: #JIS.
> ^p fork: 'someProgram' arguments: (Array with: '123').
>
> but people would need to know that, where they've been used to just
> using shOne:. So it seems like it would be a good idea to provide a
> convenience API that does what people are used to. But the question is
> what to call it. So, for example
>
> WinProcess>>shOneOEM: aString
> self encoding: (OSSystemSupport concreteClass new GetOEMCP printString
> asSymbol).
> ^self fork: self getCommandLineInterpreter arguments: (Array with: '/c'
> with: aString).
>
> which would run the command using the OEM encoding, which is the most
> likely thing programs are going to use if they're not using UTF-16 like
> good citizens. On North American Windows, that means code page 437.
>
> But if people aren't aware of the distinction, they're probably not
> going to know that OEM is the right term. Other suggestions included:
>
> shOne8Bit:
> shOneNonUnicode:
> or even just
> cmd:
>
> Any preferences, or better ideas? I thought it might also be worthwhile
> to provide shOneBinary:, which is just the same thing except it sets the
> encoding to #binary, but people might not be aware they can do that. Is
> that worthwhile, or just clutter?
>
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk
> [hidden email]
> [hidden email]
> http://www.cincom.com/smalltalk
>
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: convenience methods for encoding external process calls

Alan Knight-2
Yes, the selectors have a lot of historical baggage, and aren't the clearest. The "one" on shOne: and cshOne: means to run a single command and wait for its result, as opposed to running a shell independently and feeding it commands. I find it nice that the selectors are reasonably short, and I observe that for WinProcess, shOne: is just a call to executeSingleCommand:, but what everyone has asked about is shOne:, and you don't even mention executeSingleCommand: as one of the commands.

I'm not sure what you mean by things being command-line centric.

At 05:25 AM 2010-04-24, Holger Kleinsorgen wrote:
the selector / class names of ExternalProcess have always confused me:

- #cshOne: - as the comment already says, it's Unix-specific. And what's
the meaning of the suffix "One"? My first thought was "channel one aka
stdout(1)", but it returns the output of channel stderr(2) if there is
any output.

- #fork: - it waits for the command to finish, so calling the method
"fork" is misleading. Forking is actually done in #startProcess:arguments:

- #shOne: - same issues as #cshOne:. "sh" is a Unix-specific name, too.

- ExternalProcess is command-line-centric

     WinProcess new startProcess: 'notepad' arguments: #()
     Win32SystemSupport CreateProcess: nil arguments: 'notepad'

suggestions for new selectors:

1. replacement for fork:arguments:

   runExecutable:arguments:
     or
   runCommandLineExecutable:arguments:
     or
   run:arguments:

2. replacement for shOne:

   runCommandLine:

3. additional method to solve the encoding issue:

   runCommandLine: aString forceByteEncoding: aBoolean

The implementation of this method could use two internal methods (e.g.
#runCommandLineWithDefaultEncoding: and
#runCommandLineWithByteEncoding:), or use some switches, or whatever ;)

the important part is that the selector does not
- mention any specific encoding
- use abbreviations like "sh"

Am 22.04.2010 22:59, schrieb Alan Knight:
> One of the things that changed in 7.7 was that forking an external
> process on Windows via shOne: started consistently using Unicode (i.e.
> UTF-16) and the /u shell argument when doing it. Not all programs
> respect this, so while it made some things start giving back correct
> results for the first time, other things started giving back badly
> encoded results. This has caused a reasonable amount of confusion.
>
> It's pretty easy to interpret the process results with any encoding you
> want, just by doing something like
> p := ExternalProcess new.
> p encoding: #JIS.
> ^p fork: 'someProgram' arguments: (Array with: '123').
>
> but people would need to know that, where they've been used to just
> using shOne:. So it seems like it would be a good idea to provide a
> convenience API that does what people are used to. But the question is
> what to call it. So, for example
>
> WinProcess>>shOneOEM: aString
> self encoding: (OSSystemSupport concreteClass new GetOEMCP printString
> asSymbol).
> ^self fork: self getCommandLineInterpreter arguments: (Array with: '/c'
> with: aString).
>
> which would run the command using the OEM encoding, which is the most
> likely thing programs are going to use if they're not using UTF-16 like
> good citizens. On North American Windows, that means code page 437.
>
> But if people aren't aware of the distinction, they're probably not
> going to know that OEM is the right term. Other suggestions included:
>
> shOne8Bit:
> shOneNonUnicode:
> or even just
> cmd:
>
> Any preferences, or better ideas? I thought it might also be worthwhile
> to provide shOneBinary:, which is just the same thing except it sets the
> encoding to #binary, but people might not be aware they can do that. Is
> that worthwhile, or just clutter?
>
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk
> [hidden email]
> [hidden email]
> http://www.cincom.com/smalltalk
>
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc