Adding fsync() call to the primitiveFileFlush prim ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding fsync() call to the primitiveFileFlush prim ?

timrowledge
We have an interesting problem in Pi-land where many teachers report kids losing their Scratch work (and other, but I don’t get to fix that) because of pulling the power before fully shutting down etc. This tends to put them off trying again, apparently.

We already close files properly after writing out files but it seems that dear ol’unix likes to save actual writing for later, perhaps when it has time for a relaxing latté or whatever. It has been suggested that using fsync() might force the lazy writer to actually do its job properly, which seems reasonable. In thinking about where to add this I see a couple of obvious possibilities
a) a new primitive
b) add fsync() (suitably wrapped in case of non-availability) to the end of the sqFileFlush() called in primitiveFileFlush() code.

I think I prefer b) personally but I’m happy to be educated. Possible probles insclude some OS’ not having a fsync option; so far as I can see Windows already uses a totally disjoint set of code.

Can anyone think of bad things happening if I do this? Think of the children….

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: JSP: Jump on Sexy Programmer



Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

Eliot Miranda-2
Hi Tim,

On Sat, May 21, 2016 at 9:50 AM, tim Rowledge <[hidden email]> wrote:
We have an interesting problem in Pi-land where many teachers report kids losing their Scratch work (and other, but I don’t get to fix that) because of pulling the power before fully shutting down etc. This tends to put them off trying again, apparently.

We already close files properly after writing out files but it seems that dear ol’unix likes to save actual writing for later, perhaps when it has time for a relaxing latté or whatever. It has been suggested that using fsync() might force the lazy writer to actually do its job properly, which seems reasonable. In thinking about where to add this I see a couple of obvious possibilities
a) a new primitive
b) add fsync() (suitably wrapped in case of non-availability) to the end of the sqFileFlush() called in primitiveFileFlush() code.

It could be optional so that either
c) its only done on ARM linux builds (via e.g. -DFlushAlsoFsyncs=1), or
d) a command-line flag to the VM (e.g. -fsynconflush) which can be set by default in the squeak or scratch startup script on rpi.

This seems to me more appropriate.  I think d) is worth the effort, but c) would be very simple.

Any other suggestions?
 

I think I prefer b) personally but I’m happy to be educated. Possible probles insclude some OS’ not having a fsync option; so far as I can see Windows already uses a totally disjoint set of code.

Can anyone think of bad things happening if I do this? Think of the children….

Your consequent popularity could go to your head...?
 

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: JSP: Jump on Sexy Programmer






--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

David T. Lewis
In reply to this post by timrowledge
On Sat, May 21, 2016 at 09:50:16AM -0700, tim Rowledge wrote:

> We have an interesting problem in Pi-land where many teachers report kids losing their Scratch work (and other, but I don???t get to fix that) because of pulling the power before fully shutting down etc. This tends to put them off trying again, apparently.
>
> We already close files properly after writing out files but it seems that dear ol???unix likes to save actual writing for later, perhaps when it has time for a relaxing latt?? or whatever. It has been suggested that using fsync() might force the lazy writer to actually do its job properly, which seems reasonable. In thinking about where to add this I see a couple of obvious possibilities
> a) a new primitive
> b) add fsync() (suitably wrapped in case of non-availability) to the end of the sqFileFlush() called in primitiveFileFlush() code.
>
> I think I prefer b) personally but I???m happy to be educated. Possible probles insclude some OS??? not having a fsync option; so far as I can see Windows already uses a totally disjoint set of code.
>
> Can anyone think of bad things happening if I do this? Think of the children???.
>

I suspect that the problem lies elsewhere. The difference between fsync()
and fflush() is basically that fflush() functions at the higher level stdio
(C runtime lib) level, and fsync() is a system call at a lower level that
operates directly on the file descriptor.

The unix file functions are written to the stdio level, so the existing call
to fflush() is the right thing to do. I do not know if mixing a lower level
call in with this would hurt anything, but it will not help anything either
and it does not seem like a good thing to do.

My guess would be that there is some path through the code in the image where
you might be able to add a #flush to address the issue. This can have performance
tradeoffs, but reliable would be better than fast in a case like this.

Dave
 

Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

Eliot Miranda-2
Hi David,

On Sat, May 21, 2016 at 10:15 AM, David T. Lewis <[hidden email]> wrote:
On Sat, May 21, 2016 at 09:50:16AM -0700, tim Rowledge wrote:
> We have an interesting problem in Pi-land where many teachers report kids losing their Scratch work (and other, but I don???t get to fix that) because of pulling the power before fully shutting down etc. This tends to put them off trying again, apparently.
>
> We already close files properly after writing out files but it seems that dear ol???unix likes to save actual writing for later, perhaps when it has time for a relaxing latt?? or whatever. It has been suggested that using fsync() might force the lazy writer to actually do its job properly, which seems reasonable. In thinking about where to add this I see a couple of obvious possibilities
> a) a new primitive
> b) add fsync() (suitably wrapped in case of non-availability) to the end of the sqFileFlush() called in primitiveFileFlush() code.
>
> I think I prefer b) personally but I???m happy to be educated. Possible probles insclude some OS??? not having a fsync option; so far as I can see Windows already uses a totally disjoint set of code.
>
> Can anyone think of bad things happening if I do this? Think of the children???.
>

I suspect that the problem lies elsewhere. The difference between fsync()
and fflush() is basically that fflush() functions at the higher level stdio
(C runtime lib) level, and fsync() is a system call at a lower level that
operates directly on the file descriptor.

fflush and fsync are different.  fflush merely ensures that the stdio buffers in the process are flushed to the kernel file state via write calls. sync however ensures that kernel state for that file descriptor is written to disc.  sync does this for all kernel file state.  I think Tim's diagnosis and solution are correct.


The unix file functions are written to the stdio level, so the existing call
to fflush() is the right thing to do. I do not know if mixing a lower level
call in with this would hurt anything, but it will not help anything either
and it does not seem like a good thing to do.

My guess would be that there is some path through the code in the image where
you might be able to add a #flush to address the issue. This can have performance
tradeoffs, but reliable would be better than fast in a case like this.

Dave





--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

marcel.taeumel
In reply to this post by timrowledge
Hi Tim,

in Windows, this is called FlushFileBuffers, I guess:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa364439%28v=vs.85%29.aspx

MSDN also suggests to use unbuffered I/O instead of calling such a flush function too often. What are our options to control buffered vs. unbuffered from Squeak land?

https://support.microsoft.com/en-us/kb/99794
https://msdn.microsoft.com/en-us/library/windows/desktop/cc644950%28v=vs.85%29.aspx

On what media is the data stored? I think that you cannot be 100% sure to have all data written after some function call returns because some details are out of reach for user applications. Think of some USB driver that needs just two more cycles to finish writing... I am no expert there but it seems tricky to find the correct point in time to turn the power off. Regular OS shutdown seems more appropriate...

Best,
Marcel
Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

timrowledge
In reply to this post by Eliot Miranda-2
The issue here is that the PI - especially when used in schools - is storing everything on a micro-SD card. Being surrounded by kids is a scary thing for a computer. They don’t necessarily bother to do a nice system shutdown or even exit Scratch before yanking the power. Teachers don’t necessarily know to tell them to; lots of people doing their best with insufficient knowledge.

An interesting thing is that I ‘remembered’ that we flush files when closing them but in fact we don’t. The file flush primitive is barely used, so far as I can only really for stdio flushing. So adding a fsync call to the flush primitive would barely affect anything and I’d have to amend the Scratch file writing code to use it.

Is anyone using code that regularly flushes filestreams and might have a performance issue if an fsync were added?

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks



Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

Eliot Miranda-2
Hi Tim,

On Sat, May 21, 2016 at 10:49 AM, tim Rowledge <[hidden email]> wrote:
The issue here is that the PI - especially when used in schools - is storing everything on a micro-SD card. Being surrounded by kids is a scary thing for a computer. They don’t necessarily bother to do a nice system shutdown or even exit Scratch before yanking the power. Teachers don’t necessarily know to tell them to; lots of people doing their best with insufficient knowledge.

An interesting thing is that I ‘remembered’ that we flush files when closing them but in fact we don’t. The file flush primitive is barely used, so far as I can only really for stdio flushing. So adding a fsync call to the flush primitive would barely affect anything and I’d have to amend the Scratch file writing code to use it.

Is anyone using code that regularly flushes filestreams and might have a performance issue if an fsync were added?

Well, the case is that it /is/ a performance issue.  Writing to disc is way more expensive than flushing to kernel buffers.  How about saying what you think about my c) and d) options?  That's a way of avoiding the performance issue and solving the kids-are-humans issue.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: BOMB: Burn Out Memory Banks
 
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

timrowledge

> On 21-05-2016, at 11:02 AM, Eliot Miranda <[hidden email]> wrote:
> Well, the case is that it /is/ a performance issue.  Writing to disc is way more expensive than flushing to kernel buffers.  How about saying what you think about my c) and d) options?  That's a way of avoiding the performance issue and solving the kids-are-humans issue.

Either would be just fine by me, no problem. Adding a new prim might be even better since it would defer control up to the image, which always seems the best option to me. However, opening the can of rancid worms that is the file system interface doesn’t appeal too much right now. I think for simplicity and testing I’ll stick with a #define and we can revisit it later if it seems to solve any realworld problems.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Got into the gene pool while the lifeguard wasn't watching.



Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

David T. Lewis
In reply to this post by marcel.taeumel
On Sat, May 21, 2016 at 09:47:07AM -0700, marcel.taeumel wrote:
> Hi Tim,
>
> in Windows, this is called FlushFileBuffers, I guess:
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa364439%28v=vs.85%29.aspx

Slightly off topic, but worth mentioning: The implementation of FilePlugin
for Windows operates on HANDLE references to files, which I believe are roughly
equivalent to file descriptors on Unix. Thus the Unix VMs are written to the
higher level stdio interface, and the Windows VM uses a more direct lower level
IO strategy. I have always wondered which of the two approaches (low level
HANDLE/descriptor versus higher level buffered stdio) produces better overall
performance for Squeak.

One way to answer the question would be to implement a FilePlugin for Unix
VMs with all of the IO done at the descriptor level. Specifically, a
SQFile->file would be a reference to an integer file descriptor (similar to
a Windows HANDLE), and the platform support code would operate against
file descriptors rather than (FILE *) references.

Doing a reimplementation of FilePlugin for Unix is probably not a huge
project, but I have never gotten around to trying it.

Has anyone else wondered about this? Which is better, the Windows VM file
strategy, or the Unix VM file strategy?

Dave

>
> MSDN also suggests to use unbuffered I/O instead of calling such a flush
> function too often. What are our options to control buffered vs. unbuffered
> from Squeak land?
>
> https://support.microsoft.com/en-us/kb/99794
> https://msdn.microsoft.com/en-us/library/windows/desktop/cc644950%28v=vs.85%29.aspx
>
> On what media is the data stored? I think that you cannot be 100% sure to
> have all data written after some function call returns because some details
> are out of reach for user applications. Think of some USB driver that needs
> just two more cycles to finish writing... I am no expert there but it seems
> tricky to find the correct point in time to turn the power off. Regular OS
> shutdown seems more appropriate...
>
> Best,
> Marcel
>
>
>
> --
> View this message in context: http://forum.world.st/Adding-fsync-call-to-the-primitiveFileFlush-prim-tp4896538p4896545.html
> Sent from the Squeak - Dev mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

David T. Lewis
In reply to this post by timrowledge
On Sat, May 21, 2016 at 10:49:24AM -0700, tim Rowledge wrote:
> The issue here is that the PI - especially when used in schools - is storing everything on a micro-SD card. Being surrounded by kids is a scary thing for a computer. They don???t necessarily bother to do a nice system shutdown or even exit Scratch before yanking the power. Teachers don???t necessarily know to tell them to; lots of people doing their best with insufficient knowledge.
>

D'oh, now I get it. I was not thinking of the case of yanking the power cord.
I can well imagine that this might be a bit disruptive for normal process exit
cleanups that are supposed to ensure that fflushed buffers actually make it
to the disk-like media.

> An interesting thing is that I ???remembered??? that we flush files when closing them but in fact we don???t.

In a perfect world you do not need to flush a file when closing it, because
closing it implies a flush (e.g. fclose performs an fflush). That said,
yanking the power cord might introduce some imperfections.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

John Pfersich-2



Sent from my iPad

> On May 21, 2016, at 22:07, David T. Lewis <[hidden email]> wrote:
>
>> On Sat, May 21, 2016 at 10:49:24AM -0700, tim Rowledge wrote:
>> The issue here is that the PI - especially when used in schools - is storing everything on a micro-SD card. Being surrounded by kids is a scary thing for a computer. They don???t necessarily bother to do a nice system shutdown or even exit Scratch before yanking the power. Teachers don???t necessarily know to tell them to; lots of people doing their best with insufficient knowledge.
>
> D'oh, now I get it. I was not thinking of the case of yanking the power cord.
> I can well imagine that this might be a bit disruptive for normal process exit
> cleanups that are supposed to ensure that fflushed buffers actually make it
> to the disk-like media.
>
>> An interesting thing is that I ???remembered??? that we flush files when closing them but in fact we don???t.
>
> In a perfect world you do not need to flush a file when closing it, because
> closing it implies a flush (e.g. fclose performs an fflush). That said,
> yanking the power cord might introduce some imperfections.
>
> Dave
>
>
And I don't think that people that yank the power cord should be catered to. If you do stupid things, you should pay the consequences. A computer isn't a toaster. And teachers should convey that to their students.
Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

Eliot Miranda-2




_,,,^..^,,,_ (phone)

> On May 21, 2016, at 10:36 PM, John Pfersich <[hidden email]> wrote:
>
>
>
>
> Sent from my iPad
>>> On May 21, 2016, at 22:07, David T. Lewis <[hidden email]> wrote:
>>>
>>> On Sat, May 21, 2016 at 10:49:24AM -0700, tim Rowledge wrote:
>>> The issue here is that the PI - especially when used in schools - is storing everything on a micro-SD card. Being surrounded by kids is a scary thing for a computer. They don???t necessarily bother to do a nice system shutdown or even exit Scratch before yanking the power. Teachers don???t necessarily know to tell them to; lots of people doing their best with insufficient knowledge.
>>
>> D'oh, now I get it. I was not thinking of the case of yanking the power cord.
>> I can well imagine that this might be a bit disruptive for normal process exit
>> cleanups that are supposed to ensure that fflushed buffers actually make it
>> to the disk-like media.
>>
>>> An interesting thing is that I ???remembered??? that we flush files when closing them but in fact we don???t.
>>
>> In a perfect world you do not need to flush a file when closing it, because
>> closing it implies a flush (e.g. fclose performs an fflush). That said,
>> yanking the power cord might introduce some imperfections.
>>
>> Dave
> And I don't think that people that yank the power cord should be catered to. If you do stupid things, you should pay the consequences. A computer isn't a toaster. And teachers should convey that to their students.

Are you a parent?  Children are human beings.

I remember when I first joined ParcPlace talking with Phil Yelland who was in charge of the native GUI project that was killed by the DarkPlace-Dodgytalk merger.  He told of a situation where a Windows machine told him he didn't have permission to shut the machine down.  "Really?", he thought.  "Well then you shouldn't leave the power cord unprotected.", and yanked the power cord.

Come on man, be human.  Machines are supposed to serve us, especially itty nutty $35 machines.  Of course youngsters will turn them off (I do by mistake), and of course teachers can't be expected to be systems programmers and be able to explain their consequent insights to very young children.  So yes, the system should use fsync when appropriate, to serve the users, not the inanimate or the partially omniscient.
Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

timrowledge
>
>> And I don't think that people that yank the power cord should be catered to. If you do stupid things, you should pay the consequences. A computer isn't a toaster. And teachers should convey that to their students.
>
> Are you a parent?  Children are human beings.
>
Well I’m not a parent (that I know of) and I’m not *entirely* sure children are human beings, but when they’re my customers and get all teary after losing their painstakingly put together scripts, then I get irritated by the system being less helpful than it could be.

Besides, in a busy classroom (have you ever been a room full of excited 6 year olds) things get tripped over, snagged on waving arms, knocked over in the sheer drama of *making an LED flash!!!!* and so on. So yeah, we should flush everything possible when writing out a script. In fact I should probably see about making autosave of some form now I think of it.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Two wrongs are only the beginning.



Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

Eliot Miranda-2


On May 22, 2016, at 10:16 AM, tim Rowledge <[hidden email]> wrote:

>>
>>> And I don't think that people that yank the power cord should be catered to. If you do stupid things, you should pay the consequences. A computer isn't a toaster. And teachers should convey that to their students.
>>
>> Are you a parent?  Children are human beings.
> Well I’m not a parent (that I know of) and I’m not *entirely* sure children are human beings, but when they’re my customers and get all teary after losing their painstakingly put together scripts, then I get irritated by the system being less helpful than it could be.
>
> Besides, in a busy classroom (have you ever been a room full of excited 6 year olds) things get tripped over, snagged on waving arms, knocked over in the sheer drama of *making an LED flash!!!!* and so on. So yeah, we should flush everything possible when writing out a script. In fact I should probably see about making autosave of some form now I think of it.

Make sure it makes backup copies that don't trash the existing backup copies, before it saves over the image. The plug could get pulled at any point.  Only rename is quick enough and repairable enough to consider atomic.

>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> Two wrongs are only the beginning.
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

Eliot Miranda-2
In reply to this post by David T. Lewis
Hi David,

> On May 21, 2016, at 9:45 PM, David T. Lewis <[hidden email]> wrote:
>
>> On Sat, May 21, 2016 at 09:47:07AM -0700, marcel.taeumel wrote:
>> Hi Tim,
>>
>> in Windows, this is called FlushFileBuffers, I guess:
>> https://msdn.microsoft.com/en-us/library/windows/desktop/aa364439%28v=vs.85%29.aspx
>
> Slightly off topic, but worth mentioning: The implementation of FilePlugin
> for Windows operates on HANDLE references to files, which I believe are roughly
> equivalent to file descriptors on Unix. Thus the Unix VMs are written to the
> higher level stdio interface, and the Windows VM uses a more direct lower level
> IO strategy. I have always wondered which of the two approaches (low level
> HANDLE/descriptor versus higher level buffered stdio) produces better overall
> performance for Squeak.
>
> One way to answer the question would be to implement a FilePlugin for Unix
> VMs with all of the IO done at the descriptor level. Specifically, a
> SQFile->file would be a reference to an integer file descriptor (similar to
> a Windows HANDLE), and the platform support code would operate against
> file descriptors rather than (FILE *) references.

IME this depends on two things, whether the in-image implementation (StandardFileStream et al) is buffered or not, and whether the system provides proper finalization or simply post-mortem finalization.  Both interact.

If the image level implement ration is not buffered then the VM needs to provide it.  This is essentially our case; the problem being that external calls are relatively slow.  If buffered, then if finalization is performed on a post-mortem copy, close via finalization cannot flush unless the post-mortem copy is updated after every write, cuz it will flush stake data.

So the design we want, that we should aim towards
- does all buffering in the image
-uses ephemerons to finalize the actual file so that valid data is written in close via finalization.

With this approach the "FilePlugin" provides only the slimmest of interfaces to the OS's open, close, read, write and seek primitives, and as Tim has pointed out there are advantages in it providing single calls that combine seek;read and seek;write, eg see the current conversation about read-only file copies and the debugger (although I think my suggestion of substituteReadOnlyCopyWhile: is better).

>
> Doing a reimplementation of FilePlugin for Unix is probably not a huge
> project, but I have never gotten around to trying it.
>
> Has anyone else wondered about this? Which is better, the Windows VM file
> strategy, or the Unix VM file strategy?
>
> Dave
>
>>
>> MSDN also suggests to use unbuffered I/O instead of calling such a flush
>> function too often. What are our options to control buffered vs. unbuffered
>> from Squeak land?
>>
>> https://support.microsoft.com/en-us/kb/99794
>> https://msdn.microsoft.com/en-us/library/windows/desktop/cc644950%28v=vs.85%29.aspx
>>
>> On what media is the data stored? I think that you cannot be 100% sure to
>> have all data written after some function call returns because some details
>> are out of reach for user applications. Think of some USB driver that needs
>> just two more cycles to finish writing... I am no expert there but it seems
>> tricky to find the correct point in time to turn the power off. Regular OS
>> shutdown seems more appropriate...
>>
>> Best,
>> Marcel

_,,,^..^,,,_ (phone)

Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

timrowledge
In reply to this post by Eliot Miranda-2

> On 22-05-2016, at 12:16 PM, Eliot Miranda <[hidden email]> wrote:
>
> Make sure it makes backup copies that don't trash the existing backup copies, before it saves over the image. The plug could get pulled at any point.  Only rename is quick enough and repairable enough to consider atomic.

The good news is that it’s not the image that needs to be saved since that is kept prisitine, but the project file(s) which are much, much smaller. So I should be able to keep a simple backup directory, save there occasionally and then.. well I dunno. If a user does a ‘normal save’ then I suppose one should delete the backups to avoid confusion in the future. I guess checking that directory on startup and telling the user that there are possible projects copies? Choosing when to do a backup save is an interesting question too; clearly not when the code is being run since the interruption would be most unwelcome in a Pacman tournament. One could go for the total save-on-every action saving of Pages etc but seems like it might cause too much delays. I’ll have to try measuring it I suppose.


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
If the code and the comments disagree, then both are probably wrong



Reply | Threaded
Open this post in threaded view
|

FilePlugin IO performance stdio versus HANDLE (was: Adding fsync() call to the primitiveFileFlush prim ?)

David T. Lewis
In reply to this post by Eliot Miranda-2
I think this is what I meant to suggest:

Somebody (tm) should try a new implementation of FilePlugin for the Unix VM,
and implement all of the IO in terms of e.g. read() and write() rather than
fread() and fwrite(). Then see which one works better in terms of real world
performance.

This new plugin would be conceptually similar to Andreas' Windows plugin,
which operates on a Windows HANDLE, similar to a Unix file descriptor.

In general, we can compare the Windows and Unix VMs to see which has the
better real-world file IO performance. But those VMs are different in many
other ways, so if we just want to know the difference between file IO written
to the stdio level versus file IO written to the descriptor/HANDLE level,
then a good way to do it would be to write such a plugin for the Unix VM
and see if it is better or worse than the current stdio implementation.

This might be a good student project, or maybe a hobby project for a Sunday
Squeaker with more free Sundays than I have at the moment.

Dave



On Sun, May 22, 2016 at 12:25:49PM -0700, Eliot Miranda wrote:

> Hi David,
>
> > On May 21, 2016, at 9:45 PM, David T. Lewis <[hidden email]> wrote:
> >
> >> On Sat, May 21, 2016 at 09:47:07AM -0700, marcel.taeumel wrote:
> >> Hi Tim,
> >>
> >> in Windows, this is called FlushFileBuffers, I guess:
> >> https://msdn.microsoft.com/en-us/library/windows/desktop/aa364439%28v=vs.85%29.aspx
> >
> > Slightly off topic, but worth mentioning: The implementation of FilePlugin
> > for Windows operates on HANDLE references to files, which I believe are roughly
> > equivalent to file descriptors on Unix. Thus the Unix VMs are written to the
> > higher level stdio interface, and the Windows VM uses a more direct lower level
> > IO strategy. I have always wondered which of the two approaches (low level
> > HANDLE/descriptor versus higher level buffered stdio) produces better overall
> > performance for Squeak.
> >
> > One way to answer the question would be to implement a FilePlugin for Unix
> > VMs with all of the IO done at the descriptor level. Specifically, a
> > SQFile->file would be a reference to an integer file descriptor (similar to
> > a Windows HANDLE), and the platform support code would operate against
> > file descriptors rather than (FILE *) references.
>
> IME this depends on two things, whether the in-image implementation (StandardFileStream et al) is buffered or not, and whether the system provides proper finalization or simply post-mortem finalization.  Both interact.
>
> If the image level implement ration is not buffered then the VM needs to provide it.  This is essentially our case; the problem being that external calls are relatively slow.  If buffered, then if finalization is performed on a post-mortem copy, close via finalization cannot flush unless the post-mortem copy is updated after every write, cuz it will flush stake data.
>
> So the design we want, that we should aim towards
> - does all buffering in the image
> -uses ephemerons to finalize the actual file so that valid data is written in close via finalization.
>
> With this approach the "FilePlugin" provides only the slimmest of interfaces to the OS's open, close, read, write and seek primitives, and as Tim has pointed out there are advantages in it providing single calls that combine seek;read and seek;write, eg see the current conversation about read-only file copies and the debugger (although I think my suggestion of substituteReadOnlyCopyWhile: is better).
>
> >
> > Doing a reimplementation of FilePlugin for Unix is probably not a huge
> > project, but I have never gotten around to trying it.
> >
> > Has anyone else wondered about this? Which is better, the Windows VM file
> > strategy, or the Unix VM file strategy?
> >
> > Dave
> >
> >>
> >> MSDN also suggests to use unbuffered I/O instead of calling such a flush
> >> function too often. What are our options to control buffered vs. unbuffered
> >> from Squeak land?
> >>
> >> https://support.microsoft.com/en-us/kb/99794
> >> https://msdn.microsoft.com/en-us/library/windows/desktop/cc644950%28v=vs.85%29.aspx
> >>
> >> On what media is the data stored? I think that you cannot be 100% sure to
> >> have all data written after some function call returns because some details
> >> are out of reach for user applications. Think of some USB driver that needs
> >> just two more cycles to finish writing... I am no expert there but it seems
> >> tricky to find the correct point in time to turn the power off. Regular OS
> >> shutdown seems more appropriate...
> >>
> >> Best,
> >> Marcel
>
> _,,,^..^,,,_ (phone)

Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

Eliot Miranda-2
In reply to this post by timrowledge
Hi Tim, Hi Tim,

    TIM R, I finally looked at the plugin code and the primitive has been there since it was contributed by "monty" in September of 2015.

It's called primitiveFileSync; use as you would for primitiveFileFlush.  So no need to change the VM, but it would be easy to implement such that a command line argument made sqFileFlush call swFileSync.

Tim F, I note that in platforms/win32/plugins/FilePlugin/sqWin32FilePrims.c sqFileFlush is already implemented as sqFileSync.  Isn't this an error?  Shouldn't we remove the call and simply make sqFileFlush empty on win32?

On Sat, May 21, 2016 at 11:44 AM, tim Rowledge <[hidden email]> wrote:

> On 21-05-2016, at 11:02 AM, Eliot Miranda <[hidden email]> wrote:
> Well, the case is that it /is/ a performance issue.  Writing to disc is way more expensive than flushing to kernel buffers.  How about saying what you think about my c) and d) options?  That's a way of avoiding the performance issue and solving the kids-are-humans issue.

Either would be just fine by me, no problem. Adding a new prim might be even better since it would defer control up to the image, which always seems the best option to me. However, opening the can of rancid worms that is the file system interface doesn’t appeal too much right now. I think for simplicity and testing I’ll stick with a #define and we can revisit it later if it seems to solve any realworld problems.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Got into the gene pool while the lifeguard wasn't watching.






--
_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Adding fsync() call to the primitiveFileFlush prim ?

timrowledge

> On 23-05-2016, at 4:55 PM, Eliot Miranda <[hidden email]> wrote:
>
> Hi Tim, Hi Tim,
>
>     TIM R, I finally looked at the plugin code and the primitive has been there since it was contributed by "monty" in September of 2015.

Well blow me. So it is. That whole "wood for the trees" thing really does mess us up, right? I had the file open right in front of me as I looked at the flush prim to see if it did a sync etc and simply didn’t see an actual sync prim.

That makes life simpler for me :-)

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Performance is easier to add than clarity.