Serving files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Serving files

Brian Brown-6
POST  
User-Agent: Xnntp/beta03 (PPC Mac OS 10.3)

Hello all!

It's been a while since I did any Seaside work, but now I have a
project and am trying to get back into the swing of things....

I'm writing an online repository which will essentially serve out some
zip files. I remember some discussion on the groups a while back
regarding streaming files from the disk without loading them
completely into memory as they are served. Has anyone taken a crack at
this?

In the past I've served everything with apache, but in this case I'm
dynamically creating folders to store the files when they are
uploaded, and you have to manage permissions between the squeak
process and filesystem served by apache, as well as making sure apache
is set up correctly whenever the app is deployed. Of course, this is
all doable, just a pain :)

Brian



_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

Philippe Marschall
2006/10/6, Brian Brown <[hidden email]>:

> POST
> User-Agent: Xnntp/beta03 (PPC Mac OS 10.3)
>
> Hello all!
>
> It's been a while since I did any Seaside work, but now I have a
> project and am trying to get back into the swing of things....
>
> I'm writing an online repository which will essentially serve out some
> zip files. I remember some discussion on the groups a while back
> regarding streaming files from the disk without loading them
> completely into memory as they are served. Has anyone taken a crack at
> this?
>
> In the past I've served everything with apache, but in this case I'm
> dynamically creating folders to store the files when they are
> uploaded, and you have to manage permissions between the squeak
> process and filesystem served by apache, as well as making sure apache
> is set up correctly whenever the app is deployed. Of course, this is
> all doable, just a pain :)

Stil sounds much better than serving the files with Squeak.

Philippe
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

Jason Johnson-3
Philippe Marschall wrote:

> 2006/10/6, Brian Brown <[hidden email]>:
>> POST
>> User-Agent: Xnntp/beta03 (PPC Mac OS 10.3)
>>
>> Hello all!
>>
>> It's been a while since I did any Seaside work, but now I have a
>> project and am trying to get back into the swing of things....
>>
>> I'm writing an online repository which will essentially serve out some
>> zip files. I remember some discussion on the groups a while back
>> regarding streaming files from the disk without loading them
>> completely into memory as they are served. Has anyone taken a crack at
>> this?
>>
>> In the past I've served everything with apache, but in this case I'm
>> dynamically creating folders to store the files when they are
>> uploaded, and you have to manage permissions between the squeak
>> process and filesystem served by apache, as well as making sure apache
>> is set up correctly whenever the app is deployed. Of course, this is
>> all doable, just a pain :)
>
> Stil sounds much better than serving the files with Squeak.
>
> Philippe
> _______________________________________________
> Seaside mailing list
> [hidden email]
> http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
>


Why?  Is it way too slow? Seaside uses the Komanche web server to work,
but that is a full web server.  Mine serves static pages as well as
seaside components.
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

Philippe Marschall
2006/10/8, Jason Johnson <[hidden email]>:

> Philippe Marschall wrote:
> > 2006/10/6, Brian Brown <[hidden email]>:
> >> POST
> >> User-Agent: Xnntp/beta03 (PPC Mac OS 10.3)
> >>
> >> Hello all!
> >>
> >> It's been a while since I did any Seaside work, but now I have a
> >> project and am trying to get back into the swing of things....
> >>
> >> I'm writing an online repository which will essentially serve out some
> >> zip files. I remember some discussion on the groups a while back
> >> regarding streaming files from the disk without loading them
> >> completely into memory as they are served. Has anyone taken a crack at
> >> this?
> >>
> >> In the past I've served everything with apache, but in this case I'm
> >> dynamically creating folders to store the files when they are
> >> uploaded, and you have to manage permissions between the squeak
> >> process and filesystem served by apache, as well as making sure apache
> >> is set up correctly whenever the app is deployed. Of course, this is
> >> all doable, just a pain :)
> >
> > Stil sounds much better than serving the files with Squeak.
> >
> > Philippe
> > _______________________________________________
> > Seaside mailing list
> > [hidden email]
> > http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
> >
>
>
> Why?  Is it way too slow? Seaside uses the Komanche web server to work,
> but that is a full web server.  Mine serves static pages as well as
> seaside components.

Well the main reason is not speed but Squeak File IO in general. For
example it's not thread-safe. In C you could just do sendfile and the
kernel would send the file to the socket without buffering everything
twice and thrice and all the context switches. In Squeak, well even
testing for file existence is a journey. I don't think several files
of 50 MB or more can be handeled by Squeak in a reasonable way.

Apache is just a proven, performant, very stable solution. If there's
a bug in Apache you're not supposed to fix it yourself and you're not
told that's fun because it's open sores and stuff. And oh yes, it
automatically makes use of multiple cpus/cores and takes load of your
Squeak image.

Philippe
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

cdavidshaffer
Philippe Marschall wrote:
>
> Well the main reason is not speed but Squeak File IO in general. For
> example it's not thread-safe.
Just a point of clarification: Squeak File I/O is "thread safe" (if, by
"thread" you mean Squeak process).  The problem with it is that the
Squeak VM uses blocking I/O calls for file I/O (not for socket I/O!).  
So is quite possible that the OS will block all processes in the Squeak
image while reading or writing a file.  This can be a problem when
serving large files.  If you're serious about serving files from Squeak
you could use a modified ModFile which uses Async-I/O under platforms
that support it.  I posted a first attempt at a version of this to the
Squeak Wiki but it required a modified VM.  I think that it could be
done without modifying the VM though.  Anyway the correct solution is,
of course, making Squeak's underlying file I/O asychronous just as the
socket I/O already is.

David

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

timrowledge

On 11-Oct-06, at 5:10 AM, David Shaffer wrote:

> Philippe Marschall wrote:
>>
>> Well the main reason is not speed but Squeak File IO in general. For
>> example it's not thread-safe.
> Just a point of clarification: Squeak File I/O is "thread  
> safe" (if, by "thread" you mean Squeak process).

Assuming I understand 'tread safe' in same way that you mean it, that  
isn't strictly correct. The problem is that the squeak model use  
separate positioning and read/writing calls. Thus is is quite  
possible (been there....) to have two processes referring to the same  
file and get
procA -> position: a
procB -> position: b
procA -> read from position (which I thought was a!)
boom.


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Computing Dictionary: Recursive: (see Recursive)


_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

cdavidshaffer
tim Rowledge wrote:

>
> On 11-Oct-06, at 5:10 AM, David Shaffer wrote:
>
> Assuming I understand 'tread safe' in same way that you mean it, that
> isn't strictly correct. The problem is that the squeak model use
> separate positioning and read/writing calls. Thus is is quite possible
> (been there....) to have two processes referring to the same file and get
> procA -> position: a
> procB -> position: b
> procA -> read from position (which I thought was a!)
> boom.
>
I thought my meaning was the obvious one but now that I hear yours I'd
agree that I was wrong.  So...(let's hope the second try is a charm)

    Just a point of clarification: file I/O on a single Stream is not
    thread safe but Kom uses separate streams for each request so this
    has nothing to do with why Squeak makes a poor web server for
    serving static files.  The problem with it is that the Squeak VM
    uses blocking I/O calls for file I/O (not for socket I/O!).  So is
    quite possible that the OS will block all processes in the Squeak
    image while reading or writing a file.  This can be a problem when
    serving large files.  If you're serious about serving files from
    Squeak you could use a modified ModFile which uses Async-I/O under
    platforms that support it.  I posted a first attempt at a version of
    this to the Squeak Wiki but it required a modified VM.  I think that
    it could be done without modifying the VM though.  Anyway the
    correct solution is, of course, making Squeak's underlying file I/O
    asychronous just as the socket I/O already is.


So I've reworded my first sentence.  Do I need a third try? :-)


David

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

David T. Lewis
On Wed, Oct 11, 2006 at 01:15:37PM -0400, David Shaffer wrote:

> tim Rowledge wrote:
> >
> >On 11-Oct-06, at 5:10 AM, David Shaffer wrote:
> >
> >Assuming I understand 'tread safe' in same way that you mean it, that
> >isn't strictly correct. The problem is that the squeak model use
> >separate positioning and read/writing calls. Thus is is quite possible
> >(been there....) to have two processes referring to the same file and get
> >procA -> position: a
> >procB -> position: b
> >procA -> read from position (which I thought was a!)
> >boom.
> >
> I thought my meaning was the obvious one but now that I hear yours I'd
> agree that I was wrong.  So...(let's hope the second try is a charm)
>
>    Just a point of clarification: file I/O on a single Stream is not
>    thread safe but Kom uses separate streams for each request so this
>    has nothing to do with why Squeak makes a poor web server for
>    serving static files.  The problem with it is that the Squeak VM
>    uses blocking I/O calls for file I/O (not for socket I/O!).  So is
>    quite possible that the OS will block all processes in the Squeak
>    image while reading or writing a file.  This can be a problem when
>    serving large files.  If you're serious about serving files from
>    Squeak you could use a modified ModFile which uses Async-I/O under
>    platforms that support it.  I posted a first attempt at a version of
>    this to the Squeak Wiki but it required a modified VM.  I think that
>    it could be done without modifying the VM though.  Anyway the
>    correct solution is, of course, making Squeak's underlying file I/O
>    asychronous just as the socket I/O already is.

I don't think that blocking IO has anything to do with why Squeak
does or does not make a good web server, and it's only incidentally
related to the process/thread model in Squeak. But for what it's
worth:

To set a file stream for non-blocking reads, the required
primitives are in OSProcessPlugin (distributed with Unix VMs, or
roll your own for Windows), and used in OSProcess (Squeak Map).
See e.g. OSProcessAccessor>>setNonBlocking:. This is applicable
to OS pipes and other file-like external resources, as well as
to conventional files.

For "file event callbacks" (to borrow the tcl/tk terminology), you
can use AIO plugin on unix systems, which provides hooks into the
event-driven aio functions in the Unix VM. See AioEventHandler in
OSProcess. This uses aio callbacks along with file streams set in
non-blocking mode (the standard Squeak sockets and async files
work similarly). As long as you use non-blocking input, the Squeak
VM will not hang on input IO operations, and the aio callbacks will
signal Squeak semaphores to provide the asynchronous callbacks.

None of this has anything to do with the process or threading model
of Squeak, other than the fact that if you don't set non-blocking
behavior on files (and sockets), you will lock up the VM on certain
read operations.

Dave

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

cdavidshaffer
David T. Lewis wrote:
> On Wed, Oct 11, 2006 at 01:15:37PM -0400, David Shaffer wrote:
>  
>
> I don't think that blocking IO has anything to do with why Squeak
> does or does not make a good web server, and it's only incidentally
> related to the process/thread model in Squeak. But for what it's
> worth:
>  
My tests show differently:

http://minnow.cc.gatech.edu/squeak/539


> To set a file stream for non-blocking reads, the required
> primitives are in OSProcessPlugin (distributed with Unix VMs, or
> roll your own for Windows), and used in OSProcess (Squeak Map).
> See e.g. OSProcessAccessor>>setNonBlocking:. This is applicable
> to OS pipes and other file-like external resources, as well as
> to conventional files.
>  
As I said in my post, it can be done but ModFile doesn't currently do
it.  My wiki (above) page provides a "proof of concept" version that
does just this but requires a VM change to be able to compute the file
size in a non-blocking way.  There are other blocking file I/O calls
involved in servicing a web request as well.  I suspect those need to be
dealt with to get maximum performance.  Still, just asynchronous file
I/O provides a huge improvement (see my benchmarks on that page).

This might be the appropriate time to mention that "hacking" at classes
like ModFile to make them perform asyc file I/O isn't solving the larger
problem: Squeak file I/O needs to not block the VM.  There are many web
applications that do file I/O and having the default behavior be VM
blocking will cause headaches for people developing these applications.

> None of this has anything to do with the process or threading model
> of Squeak, other than the fact that if you don't set non-blocking
> behavior on files (and sockets), you will lock up the VM on certain
> read operations.
>  

Yes, that was exactly my point.  The first response in this e-mail
thread brought the threading issue in and I wanted to point out that
this isn't the problem.

David

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

Philippe Marschall
2006/10/12, David Shaffer <[hidden email]>:

> David T. Lewis wrote:
> > On Wed, Oct 11, 2006 at 01:15:37PM -0400, David Shaffer wrote:
> >
> >
> > I don't think that blocking IO has anything to do with why Squeak
> > does or does not make a good web server, and it's only incidentally
> > related to the process/thread model in Squeak. But for what it's
> > worth:
> >
> My tests show differently:
>
> http://minnow.cc.gatech.edu/squeak/539
>
>
> > To set a file stream for non-blocking reads, the required
> > primitives are in OSProcessPlugin (distributed with Unix VMs, or
> > roll your own for Windows), and used in OSProcess (Squeak Map).
> > See e.g. OSProcessAccessor>>setNonBlocking:. This is applicable
> > to OS pipes and other file-like external resources, as well as
> > to conventional files.
> >
> As I said in my post, it can be done but ModFile doesn't currently do
> it.  My wiki (above) page provides a "proof of concept" version that
> does just this but requires a VM change to be able to compute the file
> size in a non-blocking way.  There are other blocking file I/O calls
> involved in servicing a web request as well.  I suspect those need to be
> dealt with to get maximum performance.  Still, just asynchronous file
> I/O provides a huge improvement (see my benchmarks on that page).
>
> This might be the appropriate time to mention that "hacking" at classes
> like ModFile to make them perform asyc file I/O isn't solving the larger
> problem: Squeak file I/O needs to not block the VM.  There are many web
> applications that do file I/O and having the default behavior be VM
> blocking will cause headaches for people developing these applications.
>
> > None of this has anything to do with the process or threading model
> > of Squeak, other than the fact that if you don't set non-blocking
> > behavior on files (and sockets), you will lock up the VM on certain
> > read operations.
> >
>
> Yes, that was exactly my point.  The first response in this e-mail
> thread brought the threading issue in and I wanted to point out that
> this isn't the problem.

If two different processes want to access the same file (each creating
its own filestream !) things can get fucked up because the file
registry is not thread-safe.
This isn't theoretical. This happened on one of our production applications.

And about the OSProcessPlugin, it deadlocks from time to time. This
again happened in one of our production applications too and is the
reason why Avi doesn't use it for DabbleDB and uses cgi scripts
instead.

Philippe
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

David T. Lewis
In reply to this post by cdavidshaffer
On Wed, Oct 11, 2006 at 11:09:18PM -0400, David Shaffer wrote:
>
> This might be the appropriate time to mention that "hacking" at classes
> like ModFile to make them perform asyc file I/O isn't solving the larger
> problem: Squeak file I/O needs to not block the VM.  There are many web
> applications that do file I/O and having the default behavior be VM
> blocking will cause headaches for people developing these applications.

That's why I mentioned the OSPP primitive. If you are using Unix/Linux
or OS X, you probably already have #primitiveSQFileSetNonBlocking
available.  You're right that this is not a general solution, but
if you do need nonblocking file IO, it's worth knowing that the
primitive is already there. No doubt it would be better to have
this as an optional primitive in the FilePlugin and better integrated
into the file stream classes.

Dave

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

David T. Lewis
In reply to this post by Philippe Marschall
On Thu, Oct 12, 2006 at 08:33:46AM +0200, Philippe Marschall wrote:
> And about the OSProcessPlugin, it deadlocks from time to time. This
> again happened in one of our production applications too and is the
> reason why Avi doesn't use it for DabbleDB and uses cgi scripts
> instead.

Do you have any information as to what caused the deadlock? I am
aware that file locking between images has proved to be unreliable.
I'd like to know if there are other issues. Thanks.

Dave
 
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

Philippe Marschall
2006/10/12, David T. Lewis <[hidden email]>:
> On Thu, Oct 12, 2006 at 08:33:46AM +0200, Philippe Marschall wrote:
> > And about the OSProcessPlugin, it deadlocks from time to time. This
> > again happened in one of our production applications too and is the
> > reason why Avi doesn't use it for DabbleDB and uses cgi scripts
> > instead.
>
> Do you have any information as to what caused the deadlock? I am
> aware that file locking between images has proved to be unreliable.
> I'd like to know if there are other issues. Thanks.

UnixProcess >> #waitForCommand:

proc runState == #complete

always return false although the process is actually terminated.

Philippe
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

David T. Lewis
On Thu, Oct 12, 2006 at 01:08:57PM +0200, Philippe Marschall wrote:

> 2006/10/12, David T. Lewis <[hidden email]>:
> >On Thu, Oct 12, 2006 at 08:33:46AM +0200, Philippe Marschall wrote:
> >> And about the OSProcessPlugin, it deadlocks from time to time. This
> >> again happened in one of our production applications too and is the
> >> reason why Avi doesn't use it for DabbleDB and uses cgi scripts
> >> instead.
> >
> >Do you have any information as to what caused the deadlock? I am
> >aware that file locking between images has proved to be unreliable.
> >I'd like to know if there are other issues. Thanks.
>
> UnixProcess >> #waitForCommand:
>
> proc runState == #complete
>
> always return false although the process is actually terminated.

Philippe,

Thanks. Assuming that the process was actually terminating (as opposed
to blocking on output for some reason), this seems to suggest that a
death of child signal got missed.

Can you please tell me the operating system (OS X, or Linux) and type
of VM (Ian's or John's)? Also, are there any other plugins added that
might be doing additional pthread activity?

Thanks for your feedback on this, much appreciated.

Dave

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

cdavidshaffer
In reply to this post by Philippe Marschall
Philippe Marschall wrote:
>
> If two different processes want to access the same file (each creating
> its own filestream !) things can get fucked up because the file
> registry is not thread-safe.
> This isn't theoretical. This happened on one of our production
> applications.
What threads?  Assuming you are talking about Squeak processes...What
file registry?  Are you talking about the collection of weak references
to open file streams?..or something at the VM level?  The open file
stream registry is designed to be safe for concurrent access from Squeak
processes.  My experience is that it is.  I've pounded Linux-based file
servers over very long periods of time at high connection rates (see the
wiki page I sited in my last e-mail) and I've absolutely never seen this
issue.  Has it been discussed elsewhere?  Can you suggest how to
reproduce it?

David

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

Jason Johnson-3
In reply to this post by cdavidshaffer
David Shaffer wrote:

> tim Rowledge wrote:
>>
>> On 11-Oct-06, at 5:10 AM, David Shaffer wrote:
>>
>> Assuming I understand 'tread safe' in same way that you mean it, that
>> isn't strictly correct. The problem is that the squeak model use
>> separate positioning and read/writing calls. Thus is is quite
>> possible (been there....) to have two processes referring to the same
>> file and get
>> procA -> position: a
>> procB -> position: b
>> procA -> read from position (which I thought was a!)
>> boom.
>>
> I thought my meaning was the obvious one but now that I hear yours I'd
> agree that I was wrong.  So...(let's hope the second try is a charm)
>
>    Just a point of clarification: file I/O on a single Stream is not
>    thread safe

I know of no languages that are.  If two processes are sharing the same
data structure, then that will always have race conditions, unless every
access is blocked by a Mutex (which, of course, you don't want).
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

Mark Miller
In reply to this post by Brian Brown-6
-------------- Original message --------------
From: Jason Johnson <[hidden email]>

> David Shaffer wrote:
> > tim Rowledge wrote:
> >>
> >> On 11-Oct-06, at 5:10 AM, David Shaffer wrote:
> >>
> >> Assuming I understand 'tread safe' in same way that you mean it, that
> >> isn't strictly correct. The problem is that the squeak model use
> >> separate positioning and read/writing calls. Thus is is quite
> >> possible (been there....) to have two processes referring to the same
> >> file and get
> >> procA -> position: a
> >> procB -> position: b
> >> procA -> read from position (which I thought was a!)
> >> boom.
> >>
> > I thought my meaning was the obvious one but now that I hear yours I'd
> > agree that I was wrong. So...(let's hope the second try is a charm)
> >
> > Just a point of clarification: file I/O on a single Stream is not
> > thread safe
>
> I know of no languages that are. If two processes are sharing the same
> data structure, then that will always have race conditions, unless every
> access is blocked by a Mutex (which, of course, you don't want).
Agreed. It sounds similar to a classic readers-writers problem, where you have multiple threads sharing a memory space, but this is with a file. In effect, it sounds like when a process reads from a file it also writes a new file position to the file I/O process, in effect "changing the buffer" for other processes that also want to read from it. I must admit I'm a total newbie to Squeak (I'll be brushing up on it soon), but from both of your descriptions it would appear that the only way to, in effect, make it thread safe would be to use mutexes, as Jason said, short of changing the VM, which has been mentioned earlier.
 
An idea might be to have a "thread file handler" architecture, that each thread in Squeak could use. Perhaps this could be written in Smalltalk. It would act as an intermediary between the thread and the file I/O process. It's whole job would be to handle file input, keep track of each thread's position in the file, and implement the mutex action. To make it an effective tool one would have to implement "time-sharing" of the input--limit a thread's bandwidth in terms of bytes read over a period of time. So each thread would get some file input time, rather than one thread hogging the file until it's read all the way through it.
 
A possible workaround to the problem discussed earlier would be to have a master copy of the file, and any time a thread needed it, it would generate a unique ID, make a copy of the original to a filename whose name is that ID, and then read from the copy, rather than the original, and delete the copy when it was done.
 
I assume this just has to do with multiple processes reading from the same file, not multiple processes trying to read from different files, correct?
 
---Mark

_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

Philippe Marschall
In reply to this post by cdavidshaffer
2006/10/12, David Shaffer <[hidden email]>:
> Philippe Marschall wrote:
> >
> > If two different processes want to access the same file (each creating
> > its own filestream !) things can get fucked up because the file
> > registry is not thread-safe.
> > This isn't theoretical. This happened on one of our production
> > applications.
> What threads?  Assuming you are talking about Squeak processes...

Yes

> What
> file registry?  Are you talking about the collection of weak references
> to open file streams?

Yes. The Registry class var in StandardFileStream.

> ..or something at the VM level?  The open file
> stream registry is designed to be safe for concurrent access from Squeak
> processes.  My experience is that it is.  I've pounded Linux-based file
> servers over very long periods of time at high connection rates (see the
> wiki page I sited in my last e-mail) and I've absolutely never seen this
> issue.  Has it been discussed elsewhere?  Can you suggest how to
> reproduce it?

What we were experiencing is that it contained not yet fully
initialized instances so the test for inclusion failed.

IIRC we just fixed the symptom. Wrapped a Mutex around the code that
creates the filestreams.

Philippe
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Reply | Threaded
Open this post in threaded view
|

Re: Serving files

Jason Johnson-3
In reply to this post by Mark Miller
[hidden email] wrote:

> -------------- Original message --------------
> Agreed. It sounds similar to a classic readers-writers problem, where
> you have multiple threads sharing a memory space, but this is with a
> file. In effect, it sounds like when a process reads from a file it
> also writes a new file position to the file I/O process, in effect
> "changing the buffer" for other processes that also want to read from
> it. I must admit I'm a total newbie to Squeak (I'll be brushing up on
> it soon), but from both of your descriptions it would appear that the
> only way to, in effect, make it thread safe would be to use mutexes,
> as Jason said, short of changing the VM, which has been mentioned earlier.

Yea, the problem is, each thread starts with a snap-shot which they
change, but have no way of letting the other guys know it changed.  You
get the same thing if you edit a file on a server at the same time as
someone else.  They save, then you save.  Their work is completely lost
at that point since your editor program took a copy of the file before
it started.

But this isn't a real problem actually.  No one should be trying to
share data structures between thread's by default.  I don't think anyone
is now.  David was just clarifying that trying to isn't safe, and I was
just pointing out that this isn't *smalltalk* specific.  It isn't safe
in C/C++/Java/Oz or anything else.  If someone really *does* want to
share a data structure across threads (for speed or something) then they
have to know what they are doing and protect it with a mutex or something.

>  
> An idea might be to have a "thread file handler" architecture, that
> each thread in Squeak could use. Perhaps this could be written in
> Smalltalk. It would act as an intermediary between the thread and the
> file I/O process. It's whole job would be to handle file input, keep
> track of each thread's position in the file, and implement the mutex
> action. To make it an effective tool one would have to implement
> "time-sharing" of the input--limit a thread's bandwidth in terms of
> bytes read over a period of time. So each thread would get some
> file input time, rather than one thread hogging the file until it's
> read all the way through it.
>  
> A possible workaround to the problem discussed earlier would be to
> have a master copy of the file, and any time a thread needed it, it
> would generate a unique ID, make a copy of the original to a filename
> whose name is that ID, and then read from the copy, rather than the
> original, and delete the copy when it was done.
>  
> I assume this just has to do with multiple processes reading from the
> same file, not multiple processes trying to read from different files,
> correct?

Well, as I said above, there is no problem to solve here with the files,
per se.  However, you do bring up a good point:  perhaps the smalltalk
VM should get further into the "OS business" then it is now. What I mean
by that is: right now it sounds like there are problems certain calls
blocking the VM, etc.  Perhaps primitives should be seen in the same way
as the OS sees a system call; if a thread makes a syscall or primitive
call it gets put to sleep, the OS (or VM in this case) makes an async
request (exactly how disk access works at the OS level) and gets some
kind of notification when it is complete, fills in the data where the
thread expects it and wakes it up again.
_______________________________________________
Seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside