I am refereshing my familiarity with streams by looking at PharoByExample1 which says... "You have to remember that each time you open a stream on a file, you have to close it too." Does that mean if in Workspace I iteratively develop some code to process a stream, so that I am continually running... myStream := FileStream readOnlyFileNamed: 'C:\test.txt. ...so that 'myStream' is overwritten each time such that I lose reference to the previous stream, and I am no longer able to send #close to it, am I creating a memory leak in my image ? My prior assumption has been that once 'myStream' drops the reference drops to the previous stream, garbage collection would take care of 'everything'. cheers -ben |
Ben,
On 03 Nov 2012, at 03:23, Ben Coman <[hidden email]> wrote: > I am refereshing my familiarity with streams by looking at PharoByExample1 which says... > "You have to remember that each time you open a stream on a file, you have to close it too." > > Does that mean if in Workspace I iteratively develop some code to process a stream, so that I am continually running... > myStream := FileStream readOnlyFileNamed: 'C:\test.txt. > ...so that 'myStream' is overwritten each time such that I lose reference to the previous stream, and I am no longer able to send #close to it, > am I creating a memory leak in my image ? > > My prior assumption has been that once 'myStream' drops the reference drops to the previous stream, garbage collection would take care of 'everything'. > > cheers -ben Yes, external resources need to be closed, even though finialization (the garbage collector telling an object that it is about to become garbage) will try to clean up as well but it might take some time. The proper idiom to use is FileStream readOnlyFileNamed: 'C:\test.txt' do: [ :stream | … ]. or in Pharo 2.0 'C:\test.txt' asFileReference readStreamDo: [ :stream | … ]. 'C:\test.txt' asFileReference writeStreamDo: [ :stream | … ]. HTH, Sven -- Sven Van Caekenberghe http://stfx.eu Smalltalk is the Red Pill |
Thanks Sven... and also thanks very much for your NeoCSV package [1]
[2]. Coincidently my original question was related to practicing how to use NeoCSV to read some tab-separated data. A bit of feedback on NeoCSV... 1. Very easy to understand and I loved the #recordClass feature returning a model object. 2. Nicely documented [1]. Also perhaps mention the repository location and also that it can be found under Pharo Tool > Configuration Browser. 3. My data records had blanks in a lot of fields (eg data1,data2,,data4) so that #addFloatField: was failing. So I extended with the following method... ---- addFloatField: accessor ifFail: failBlock self addField: accessor converter: [ :string | Float readFrom: string ifFail: failBlock ] ---- Perhaps you could add similiar, as well as a default 'failBlock' (maybe one per Integer/Float) 4. Before I added (3.), the debugger would open about eight levels deeper than required to identify the culprit in the file. It would be useful if a custom Exception could be generated that held the line and accessor that was being processed (not that I'm know anything about Exceptions to know if that would be possible). That might be a good default 'failBlock'. [1] https://github.com/svenvc/docs/blob/master/neo/neo-csv-paper.md [2] http://forum.world.st/ANN-NeoCSV-td4636208.html [3] http://mc.stfx.eu/Neo Btw, based on your response to my original post, while I was playing in Workspace I ended up using it like this.... ----change test file source--- csvStream close. csvStream := FileStream readOnlyFileNamed: 'C:\#ENG8002\Test Data\Motors\motors.TXT'. ----iteratively-build-and-test-field-list---- csvStream reset. csvStream nextLine. "ignore headers" reader := (NeoCSVReader on: csvStream) separator: Character tab ; recordClass: LEMotorData . addIntegerField: #idMotor: ifFail: [ nil ] ; addFloatField: #hp: ifFail: [ nil ] ; ...etc reader upToEnd explore. ----etc---- Thanks again. cheers -ben Sven Van Caekenberghe wrote: > Ben, > > On 03 Nov 2012, at 03:23, Ben Coman <[hidden email]> wrote: > > >> I am refereshing my familiarity with streams by looking at PharoByExample1 which says... >> "You have to remember that each time you open a stream on a file, you have to close it too." >> >> Does that mean if in Workspace I iteratively develop some code to process a stream, so that I am continually running... >> myStream := FileStream readOnlyFileNamed: 'C:\test.txt. >> ...so that 'myStream' is overwritten each time such that I lose reference to the previous stream, and I am no longer able to send #close to it, >> am I creating a memory leak in my image ? >> >> My prior assumption has been that once 'myStream' drops the reference drops to the previous stream, garbage collection would take care of 'everything'. >> >> cheers -ben >> > > Yes, external resources need to be closed, even though finialization (the garbage collector telling an object that it is about to become garbage) will try to clean up as well but it might take some time. > > The proper idiom to use is > > FileStream readOnlyFileNamed: 'C:\test.txt' do: [ :stream | … ]. > > or in Pharo 2.0 > > 'C:\test.txt' asFileReference readStreamDo: [ :stream | … ]. > 'C:\test.txt' asFileReference writeStreamDo: [ :stream | … ]. > > HTH, > > Sven > > -- > Sven Van Caekenberghe > http://stfx.eu > Smalltalk is the Red Pill > > > > |
Continuing on... (The kids came in and I sent the last post off before I
finished :) 5. Just dreaming... but since 80% of my time using NeoCSV was reviewing the data, transferring field headings to camelCase for the class definition, typing in each #addField: method. and since a lot of csv files have headings on the first line, something like a class NeoCSVGenerator that read the headings to create a class with matching instance variables would be really nice. In the first instance it could just output a string for the user to be manually copy/execute to the right location. Maybe later push it through the refactoring browser. For example.... ---- csvgen := NeoCSVGenerator createclass: MotorData on: csvStream csvgen classDefinition inspect. csvgen methodDefinition inspect. ---- #methodDefinition could read in the whole sample file and add comments for each field line showing the Integer/Float/blank statistics for to assist with decision making. ---- addField: #horsePower: ; "100 floats, 3 integers, 2 blanks" ---- In the new year (after I have handed in my thesis), you might bump me to add this myself (if you are interested but hadn't done it already). cheers -ben Ben Coman wrote: > Thanks Sven... and also thanks very much for your NeoCSV package [1] > [2]. Coincidently my original question was related to practicing how > to use NeoCSV to read some tab-separated data. A bit of feedback on > NeoCSV... > > 1. Very easy to understand and I loved the #recordClass feature > returning a model object. > > 2. Nicely documented [1]. Also perhaps mention the repository > location and also that it can be found under Pharo Tool > > Configuration Browser. > 3. My data records had blanks in a lot of fields (eg > data1,data2,,data4) so that #addFloatField: was failing. So I extended > with the following method... > ---- > addFloatField: accessor ifFail: failBlock > self > addField: accessor > converter: [ :string | Float readFrom: string ifFail: failBlock ] > ---- > Perhaps you could add similiar, as well as a default 'failBlock' > (maybe one per Integer/Float) > > 4. Before I added (3.), the debugger would open about eight levels > deeper than required to identify the culprit in the file. It would be > useful if a custom Exception could be generated that held the line and > accessor that was being processed (not that I'm know anything about > Exceptions to know if that would be possible). That might be a good > default 'failBlock'. > > [1] https://github.com/svenvc/docs/blob/master/neo/neo-csv-paper.md > [2] http://forum.world.st/ANN-NeoCSV-td4636208.html > [3] http://mc.stfx.eu/Neo > > Btw, based on your response to my original post, while I was playing > in Workspace I ended up using it like this.... > ----change test file source--- > csvStream close. > csvStream := FileStream readOnlyFileNamed: 'C:\#ENG8002\Test > Data\Motors\motors.TXT'. > > ----iteratively-build-and-test-field-list---- > csvStream reset. > csvStream nextLine. "ignore headers" > reader := (NeoCSVReader on: csvStream) > separator: Character tab ; > recordClass: LEMotorData . > addIntegerField: #idMotor: ifFail: [ nil ] ; > addFloatField: #hp: ifFail: [ nil ] ; > ...etc > reader upToEnd explore. > ----etc---- > > Thanks again. > cheers -ben > > Sven Van Caekenberghe wrote: >> Ben, >> >> On 03 Nov 2012, at 03:23, Ben Coman <[hidden email]> wrote: >> >> >>> I am refereshing my familiarity with streams by looking at >>> PharoByExample1 which says... >>> "You have to remember that each time you open a stream on a file, you >>> have to close it too." >>> >>> Does that mean if in Workspace I iteratively develop some code to >>> process a stream, so that I am continually running... >>> myStream := FileStream readOnlyFileNamed: 'C:\test.txt. >>> ...so that 'myStream' is overwritten each time such that I lose >>> reference to the previous stream, and I am no longer able to send >>> #close to it, >>> am I creating a memory leak in my image ? >>> >>> My prior assumption has been that once 'myStream' drops the >>> reference drops to the previous stream, garbage collection would >>> take care of 'everything'. >>> >>> cheers -ben >>> >> >> Yes, external resources need to be closed, even though finialization >> (the garbage collector telling an object that it is about to become >> garbage) will try to clean up as well but it might take some time. >> >> The proper idiom to use is >> >> FileStream readOnlyFileNamed: 'C:\test.txt' do: [ :stream | … ]. >> >> or in Pharo 2.0 >> >> 'C:\test.txt' asFileReference readStreamDo: [ :stream | … ]. >> 'C:\test.txt' asFileReference writeStreamDo: [ :stream | … ]. >> >> HTH, >> >> Sven >> >> -- >> Sven Van Caekenberghe >> http://stfx.eu >> Smalltalk is the Red Pill >> >> >> >> > > > |
In reply to this post by Ben Coman
Hi Ben,
Thanks for all the feedback, I am happy that NeoCSV is useful for you. Regarding your problem, I tried a related but different solution. When a user specifies that a field should be an integer or a float, IMO you want an error in case the input cannot be parsed. But indeed, empty/missing fields are an exception. I implemented the default option to skip further processing empty fields. No configuration needed. Please have a look and tell me if that solves your use case. I committed the following: === Name: Neo-CSV-Core-SvenVanCaekenberghe.10 Author: SvenVanCaekenberghe Time: 4 November 2012, 8:12:54.095 pm UUID: e73487f9-57d1-40f1-bc0e-578c97c08721 Ancestors: Neo-CSV-Core-SvenVanCaekenberghe.9 changed #readNextRecord and #readNextRecordAsObject to not do anything if empty/missing fields are read; this also means that empty strings are not passed to converters or field accessors; this solves the problem reported by Ben Coman where missing integer/float fields gave errors === Name: Neo-CSV-Tests-SvenVanCaekenberghe.9 Author: SvenVanCaekenberghe Time: 4 November 2012, 8:13:54.008 pm UUID: ffaaa773-d3f6-4b10-adaa-bd2588af8eb5 Ancestors: Neo-CSV-Tests-SvenVanCaekenberghe.8 added #testEmptyConversions and #testEmptyConversionsTestObject; changed #readNextRecord and #readNextRecordAsObject to not do anything if empty/missing fields are read; this also means that empty strings are not passed to converters or field accessors; this solves the problem reported by Ben Coman where missing integer/float fields gave errors === Regards, Sven -- Sven Van Caekenberghe http://stfx.eu Smalltalk is the Red Pill -- Sven Van Caekenberghe http://stfx.eu Smalltalk is the Red Pill On 04 Nov 2012, at 03:32, Ben Coman <[hidden email]> wrote: > Thanks Sven... and also thanks very much for your NeoCSV package [1] [2]. Coincidently my original question was related to practicing how to use NeoCSV to read some tab-separated data. A bit of feedback on NeoCSV... > > 1. Very easy to understand and I loved the #recordClass feature returning a model object. > > 2. Nicely documented [1]. Also perhaps mention the repository location and also that it can be found under Pharo Tool > Configuration Browser. > 3. My data records had blanks in a lot of fields (eg data1,data2,,data4) so that #addFloatField: was failing. So I extended with the following method... > ---- > addFloatField: accessor ifFail: failBlock > self > addField: accessor > converter: [ :string | Float readFrom: string ifFail: failBlock ] > ---- > Perhaps you could add similiar, as well as a default 'failBlock' (maybe one per Integer/Float) > > 4. Before I added (3.), the debugger would open about eight levels deeper than required to identify the culprit in the file. It would be useful if a custom Exception could be generated that held the line and accessor that was being processed (not that I'm know anything about Exceptions to know if that would be possible). That might be a good default 'failBlock'. > > [1] https://github.com/svenvc/docs/blob/master/neo/neo-csv-paper.md > [2] http://forum.world.st/ANN-NeoCSV-td4636208.html > [3] http://mc.stfx.eu/Neo > > Btw, based on your response to my original post, while I was playing in Workspace I ended up using it like this.... > ----change test file source--- > csvStream close. > csvStream := FileStream readOnlyFileNamed: 'C:\#ENG8002\Test Data\Motors\motors.TXT'. > > ----iteratively-build-and-test-field-list---- > csvStream reset. > csvStream nextLine. "ignore headers" > reader := (NeoCSVReader on: csvStream) > separator: Character tab ; > recordClass: LEMotorData . > addIntegerField: #idMotor: ifFail: [ nil ] ; > addFloatField: #hp: ifFail: [ nil ] ; > ...etc > reader upToEnd explore. > ----etc---- > > Thanks again. > cheers -ben > > Sven Van Caekenberghe wrote: >> Ben, >> >> On 03 Nov 2012, at 03:23, Ben Coman <[hidden email]> wrote: >> >>> I am refereshing my familiarity with streams by looking at PharoByExample1 which says... >>> "You have to remember that each time you open a stream on a file, you have to close it too." >>> >>> Does that mean if in Workspace I iteratively develop some code to process a stream, so that I am continually running... >>> myStream := FileStream readOnlyFileNamed: 'C:\test.txt. >>> ...so that 'myStream' is overwritten each time such that I lose reference to the previous stream, and I am no longer able to send #close to it, >>> am I creating a memory leak in my image ? >>> >>> My prior assumption has been that once 'myStream' drops the reference drops to the previous stream, garbage collection would take care of 'everything'. >>> >>> cheers -ben >> >> Yes, external resources need to be closed, even though finialization (the garbage collector telling an object that it is about to become garbage) will try to clean up as well but it might take some time. >> >> The proper idiom to use is >> >> FileStream readOnlyFileNamed: 'C:\test.txt' do: [ :stream | … ]. >> >> or in Pharo 2.0 >> >> 'C:\test.txt' asFileReference readStreamDo: [ :stream | … ]. >> 'C:\test.txt' asFileReference writeStreamDo: [ :stream | … ]. >> >> HTH, >> >> Sven >> >> -- >> Sven Van Caekenberghe >> http://stfx.eu >> Smalltalk is the Red Pill >> > |
In reply to this post by Ben Coman
On 04 Nov 2012, at 04:10, Ben Coman <[hidden email]> wrote: > Continuing on... (The kids came in and I sent the last post off before I finished :) > > 5. Just dreaming... but since 80% of my time using NeoCSV was reviewing the data, transferring field headings to camelCase for the class definition, typing in each #addField: method. > and since a lot of csv files have headings on the first line, something like a class NeoCSVGenerator that read the headings to create a class with matching instance variables would be really nice. In the first instance it could just output a string for the user to be manually copy/execute to the right location. Maybe later push it through the refactoring browser. For example.... > ---- > csvgen := NeoCSVGenerator createclass: MotorData on: csvStream > csvgen classDefinition inspect. > csvgen methodDefinition inspect. > ---- > > #methodDefinition could read in the whole sample file and add comments for each field line showing the Integer/Float/blank statistics for to assist with decision making. > ---- > addField: #horsePower: ; "100 floats, 3 integers, 2 blanks" > ---- > > > In the new year (after I have handed in my thesis), you might bump me to add this myself (if you are interested but hadn't done it already). That might indeed be an option. It goes a bit further than what I want to add to NeoCSV and it might be hard to make it portable, but if one needs to parse many different CSV files with many fields, it could save a lot of work. Another, less efficient iidea would be to create dictionaries for each record with the field names as keys. -- Sven Van Caekenberghe http://stfx.eu Smalltalk is the Red Pill |
In reply to this post by Ben Coman
On Sat, Nov 3, 2012 at 3:23 AM, Ben Coman <[hidden email]> wrote:
> My prior assumption has been that once 'myStream' drops the reference drops > to the previous stream, garbage collection would take care of 'everything'. unless you do it very very frequently, you should be fine. But in real source code, pay attention to always close manually. -- Damien Cassou http://damiencassou.seasidehosting.st "Success is the ability to go from one failure to another without losing enthusiasm." Winston Churchill |
In reply to this post by Sven Van Caekenberghe-2
That works very well. Thanks Sven.
Sven Van Caekenberghe wrote: > Hi Ben, > > Thanks for all the feedback, I am happy that NeoCSV is useful for you. > > Regarding your problem, I tried a related but different solution. When a user specifies that a field should be an integer or a float, IMO you want an error in case the input cannot be parsed. But indeed, empty/missing fields are an exception. I implemented the default option to skip further processing empty fields. No configuration needed. > > Please have a look and tell me if that solves your use case. > > I committed the following: > > === > Name: Neo-CSV-Core-SvenVanCaekenberghe.10 > Author: SvenVanCaekenberghe > Time: 4 November 2012, 8:12:54.095 pm > UUID: e73487f9-57d1-40f1-bc0e-578c97c08721 > Ancestors: Neo-CSV-Core-SvenVanCaekenberghe.9 > > changed #readNextRecord and #readNextRecordAsObject to not do anything if empty/missing fields are read; > this also means that empty strings are not passed to converters or field accessors; > this solves the problem reported by Ben Coman where missing integer/float fields gave errors > === > Name: Neo-CSV-Tests-SvenVanCaekenberghe.9 > Author: SvenVanCaekenberghe > Time: 4 November 2012, 8:13:54.008 pm > UUID: ffaaa773-d3f6-4b10-adaa-bd2588af8eb5 > Ancestors: Neo-CSV-Tests-SvenVanCaekenberghe.8 > > added #testEmptyConversions and #testEmptyConversionsTestObject; > changed #readNextRecord and #readNextRecordAsObject to not do anything if empty/missing fields are read; > this also means that empty strings are not passed to converters or field accessors; > this solves the problem reported by Ben Coman where missing integer/float fields gave errors > === > > Regards, > > Sven > > -- > Sven Van Caekenberghe > http://stfx.eu > Smalltalk is the Red Pill > > -- > Sven Van Caekenberghe > http://stfx.eu > Smalltalk is the Red Pill > > > > On 04 Nov 2012, at 03:32, Ben Coman <[hidden email]> wrote: > > >> Thanks Sven... and also thanks very much for your NeoCSV package [1] [2]. Coincidently my original question was related to practicing how to use NeoCSV to read some tab-separated data. A bit of feedback on NeoCSV... >> >> 1. Very easy to understand and I loved the #recordClass feature returning a model object. >> >> 2. Nicely documented [1]. Also perhaps mention the repository location and also that it can be found under Pharo Tool > Configuration Browser. >> 3. My data records had blanks in a lot of fields (eg data1,data2,,data4) so that #addFloatField: was failing. So I extended with the following method... >> ---- >> addFloatField: accessor ifFail: failBlock >> self >> addField: accessor >> converter: [ :string | Float readFrom: string ifFail: failBlock ] >> ---- >> Perhaps you could add similiar, as well as a default 'failBlock' (maybe one per Integer/Float) >> >> 4. Before I added (3.), the debugger would open about eight levels deeper than required to identify the culprit in the file. It would be useful if a custom Exception could be generated that held the line and accessor that was being processed (not that I'm know anything about Exceptions to know if that would be possible). That might be a good default 'failBlock'. >> >> [1] https://github.com/svenvc/docs/blob/master/neo/neo-csv-paper.md >> [2] http://forum.world.st/ANN-NeoCSV-td4636208.html >> [3] http://mc.stfx.eu/Neo >> >> Btw, based on your response to my original post, while I was playing in Workspace I ended up using it like this.... >> ----change test file source--- >> csvStream close. >> csvStream := FileStream readOnlyFileNamed: 'C:\#ENG8002\Test Data\Motors\motors.TXT'. >> >> ----iteratively-build-and-test-field-list---- >> csvStream reset. >> csvStream nextLine. "ignore headers" >> reader := (NeoCSVReader on: csvStream) >> separator: Character tab ; >> recordClass: LEMotorData . >> addIntegerField: #idMotor: ifFail: [ nil ] ; >> addFloatField: #hp: ifFail: [ nil ] ; >> ...etc >> reader upToEnd explore. >> ----etc---- >> >> Thanks again. >> cheers -ben >> >> Sven Van Caekenberghe wrote: >> >>> Ben, >>> >>> On 03 Nov 2012, at 03:23, Ben Coman <[hidden email]> wrote: >>> >>> >>>> I am refereshing my familiarity with streams by looking at PharoByExample1 which says... >>>> "You have to remember that each time you open a stream on a file, you have to close it too." >>>> >>>> Does that mean if in Workspace I iteratively develop some code to process a stream, so that I am continually running... >>>> myStream := FileStream readOnlyFileNamed: 'C:\test.txt. >>>> ...so that 'myStream' is overwritten each time such that I lose reference to the previous stream, and I am no longer able to send #close to it, >>>> am I creating a memory leak in my image ? >>>> >>>> My prior assumption has been that once 'myStream' drops the reference drops to the previous stream, garbage collection would take care of 'everything'. >>>> >>>> cheers -ben >>>> >>> Yes, external resources need to be closed, even though finialization (the garbage collector telling an object that it is about to become garbage) will try to clean up as well but it might take some time. >>> >>> The proper idiom to use is >>> >>> FileStream readOnlyFileNamed: 'C:\test.txt' do: [ :stream | … ]. >>> >>> or in Pharo 2.0 >>> >>> 'C:\test.txt' asFileReference readStreamDo: [ :stream | … ]. >>> 'C:\test.txt' asFileReference writeStreamDo: [ :stream | … ]. >>> >>> HTH, >>> >>> Sven >>> >>> -- >>> Sven Van Caekenberghe >>> http://stfx.eu >>> Smalltalk is the Red Pill >>> >>> > > > > |
In reply to this post by Damien Cassou
Thanks for the clarification.
Damien Cassou wrote: > On Sat, Nov 3, 2012 at 3:23 AM, Ben Coman <[hidden email]> wrote: > >> My prior assumption has been that once 'myStream' drops the reference drops >> to the previous stream, garbage collection would take care of 'everything'. >> > > unless you do it very very frequently, you should be fine. But in real > source code, pay attention to always close manually. > > -- > Damien Cassou > http://damiencassou.seasidehosting.st > > "Success is the ability to go from one failure to another without > losing enthusiasm." > Winston Churchill > > > |
Free forum by Nabble | Edit this page |