Greetings,
I am writing a program to consolidate all my personal finances for tax time next year. (This is not a school project.) There are transaction files from several banks and credit card companies. Each has a similar format, CSV, but they vary in many ways, order of items, extra notes, pipe delimited or tabs, etc. I want to read them and load them into a collection of transaction objects. 1. Should I have a FileReader object? 2. Should it have subclasses like FileReaderAmericanExpress, FileReaderJPMorgan ? 3. Or should it have different methods like loadAmericanExpresFile, loadJPMorganFile ? 4. Is a Collection of Transaction objects, the structure that you would load the files into? The rest of the project would be to do data checking on the files, to make sure there are no duplicates or missing dates. Then write reports that I can give to my accountant. I would appreciate some design help? Sincerely, Joe._______________________________________________ Beginners mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/beginners |
Joe, My suggestion is to use fewer object classes and more methods. Play with it until you know what you are doing, then objects, instance variables, and methods come more naturally without as much need for prior design. You can refactor or reorganize fairly quickly once you master the application and tools available. Sometimes it is necessary to write new versions of an application as you learn more about what you need. Included in this advice is a suggestion to keep working your software over and over in Smalltalk and not abstract it on paper too much which can waste time. A design embedded in software is a lot closer to working than a design on paper (or CRC cards). But occasionally a few notes on paper can help when going from one version to a new one. Kirk On Thu, Apr 28, 2016 at 3:15 PM, Joseph Alotta <[hidden email]> wrote: Greetings, _______________________________________________ Beginners mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/beginners |
In reply to this post by Joseph Alotta
Hi Joe,
I agree with Kirt's suggestions in general. See more below. On Thu, 28 Apr 2016 17:15:05 -0500, Joseph Alotta <[hidden email]> wrote: >Greetings, >I am writing a program to consolidate all my personal finances for tax time next year. (This is not a school project.) >There are transaction files from several banks and credit card companies. Each has a similar format, CSV, but they vary in many ways, order of items, extra notes, pipe delimited or tabs, etc. I want to read them and load them into a collection of transaction objects. >1. Should I have a FileReader object? >2. Should it have subclasses like FileReaderAmericanExpress, FileReaderJPMorgan ? No. I would with the object you want to hold the information and work out from there. An object or your main program object should read the files and instantiate the objects (same class) that gets the data. >3. Or should it have different methods like loadAmericanExpresFile, loadJPMorganFile ? Yes. Use different methods to read and parse the files and instantiate the objects. Depending upon how close (similar) the files are you could have a method with a few parameters (three or four at most) that handles more than one file. >4. Is a Collection of Transaction objects, the structure that you would load the files into? Yes. An ordered or sorted collection would be good so you could sort by date/time if that is helpful. >The rest of the project would be to do data checking on the files, to make sure there are no duplicates or missing dates. Then write reports that I can give to my accountant. Sounds good. >I would appreciate some design help? >Sincerely, >Joe. Lou -- Louis LaBrunda Keystone Software Corp. SkypeMe callto://PhotonDemon _______________________________________________ Beginners mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/beginners |
In reply to this post by Joseph Alotta
Hi Joe,
Depending on how many different structures you have you may want to consider having some external configuration object. My first thought when reading your description was that there are only so many ways to parse, and so many different types of data. Having a collection of parser objects that handle specific translations seems clean. Once you have an idea of what types of translations you need you could trigger those operations by building your collection of parser objects from an external config file. In that way you read the parameters in (could be a file in the JPMorgan directory) conf.xml or parse.conf something like that, and then use that to set up your parser. The parser is then written to be generic and reusable. It would allow you to write methods once and then reuse them for files you haven't seen yet by creating a new config file for your format. You could even have a config file generator that asks you questions and shows you the results from a current sample data file. :) fieldSeperator: #comma. fieldDelimited: #doubleQuote. nameSeperator: #comma. nameFormat: 'title, first, [mi], last, [suffix]'. balance: #USD. fieldOrder: 'id, name, balance'. ... One other thought is if there is a way for the system to determine what format to use (this is a JPMorgan file) then instead of an external config file you could just store the different configs internally and match for the right config collection and error if one doesn't exist (asking the user to create one using your config builder method). if you can't match then having a file in a JPMorgan directory seems simple enough. In general thinking of the setup step as a collection of generic parser configuration objects instead of a different parsing method for each file, simplifies everything. Of course back to my original point, writing 3 parser methods will go faster if there are not that many formats. There is always a trade off when you consider building a framework or just hacking some code that works :). All the best, Ron Teitelbaum > From: Joseph Alotta > Sent: Thursday, April 28, 2016 6:15 PM > > Greetings, > > I am writing a program to consolidate all my personal finances for tax time > next year. (This is not a school project.) > > There are transaction files from several banks and credit card companies. > Each has a similar format, CSV, but they vary in many ways, order of items, > extra notes, pipe delimited or tabs, etc. I want to read them and load them > into a collection of transaction objects. > > 1. Should I have a FileReader object? > > 2. Should it have subclasses like FileReaderAmericanExpress, > FileReaderJPMorgan ? > > 3. Or should it have different methods like loadAmericanExpresFile, > loadJPMorganFile ? > > 4. Is a Collection of Transaction objects, the structure that you would > the files into? > > The rest of the project would be to do data checking on the files, to make > sure there are no duplicates or missing dates. Then write reports that I can > give to my accountant. > > I would appreciate some design help? > > Sincerely, > > Joe._______________________________________________ > Beginners mailing list > [hidden email] > http://lists.squeakfoundation.org/mailman/listinfo/beginners _______________________________________________ Beginners mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/beginners |
Thanks for all the help.
I like the idea of having the code sense the format of the data and acting accordingly. For separators, I could count the number of each kind of separators in the file and compare it to the number of lines. Say 3 or more separators per line. Then I can parse by columns and look for the dominant data type. For a column that is 60% matching a date type, I can assume it is a date column and the mismatches are headers. The amount should be numeric. The payee should be mostly letters, etc. One issue I have is knowing what to call the object that does this. It would not be a Transaction, because this is a function of many Transactions. FileLoader? FileAnalyzer? Also, at this point I should be looking for missing dates and duplicates. Duplicates are troublesome, since everytime I download the file, it starts from the beginning of the year again. I keep downloading them because I think they will only keep data for 6 months or so. Also duplicate transactions are valid. Suppose I go into a coffee shop and buy a cup of coffee, then go back the same day, same store for a refill. Your thoughts? Sincerely, Joe. |
Hi Joseph,
I'm making some data visualizations and despite of not having an advice on conceptual design, I share part of the practical problem of having to work with CSV values in a Smalltalk environment and some times with a lot of records (my recent project works with 270k of them). The visualization I did was documented broadly at [1], but essentially I create a "PublishedMedInfo class >> loadDataFromCSV: aFile usingDelimiter: aCharacter" method that fill out my domain objects that came from an excel (and then CSV) file. [1] http://mutabit.com/offray/blog/en/entry/sdv-infomed For my recent project [2] I'm using a SQLite bridge between Pharo and the imported data from CVS. In that way I'm delegating storage and querying (including duplicates) to a small but potent database back-end, while using objects to model "higher" concerns of my domain. I know some worries about objects-database mismatch impedance, but working with data and its visualization/reporting lets you to build bridges leveraging the former to the database and the last to objects, while using the strengths of each one in their own place. [2] https://twitter.com/offrayLC/status/725314838696701957 So my practical advice is to explore this kinds of combination early in your design. May be a quick hands on mockup could let you know if it works for you. In my case it has and I'm implementing it sooner in my projects. Cheers, Offray Ps: Long time without writing, but I have been reading constantly. Nice to be "back" :-) On 29/04/16 09:28, Joseph Alotta wrote:
Thanks for all the help. _______________________________________________ Beginners mailing list [hidden email] http://lists.squeakfoundation.org/mailman/listinfo/beginners |
Free forum by Nabble | Edit this page |