Hello Hannes,
Sorry for the late response, I have been working intensively in an application using BioSmalltalk. Here is a post with some screenshots: http://biosmalltalk.blogspot.com.ar/2013/02/phyloclasstalk-preview.html as I've said, it is developed in Pharo but most subsystems work in Squeak too. I cross-post to the Pharo users list in case someone is interested. El 16/02/2013 16:00, H. Hirzel escribió: > Hello Hernán > > Thank you for your elaboration on the topic of BioSqueak. > > On 2/1/13, Hernán Morales Durand <[hidden email]> wrote: >> >> Hello Hannes, >> Thanks for the feedback! Some answers then between the lines: >> >> El 01/02/2013 11:35, H. Hirzel escribió: >>> Hello Hernán >>> >>> This is interesting. >>> http://biosmalltalk.blogspot.com/ >>> >>> I understand that you have constructed an internal domain specific >>> language (a DSL, a query language) for dealing with genetic data in >>> Smalltalk >>> >>> search := BioNCBIWWWBlastClient new nucleotide query: >>> 'CCCTCAAACAT...TTTGAGGAG'; >>> hitListSize: 150; >>> filterLowComplexity; >>> expectValue: 10; >>> wordSize: 11; >>> blastn; >>> blastPlainService; >>> alignmentViewFlatQueryAnchored; >>> formatTypeXML; >>> fetch. >>> search outputToFile: 'blast-query-result.xml' contents: search result. >>> >>> Is there a description of this DSL? >> >> Is not a DSL in the traditional sense, i.e., using ANTLR, Lex or Yacc, >> but a "DSL" which is embedded thus inheriting the syntax and execution >> semantics of Smalltalk. > > Yes, I understand, the regular thing in Smalltalk as every Smalltalk > domain model could be considered a DSL to a certain extent/ > > Lukas Renggli has a useful classification on DSLs in his PhD dissertation on > 'Dynamic Language Embedding'' > http://scg.unibe.ch/archive/phd/renggli-phd.pdf > Chapter 2 > > According to that you probably have an Internal DSL (chapter 2.1), right? > Yes, it would fit into the Internal DSL category. I didn't knew about that classification, thanks for sharing. > >> To clarify: I've not built a DSL specification for the QBlast API, >> although I'm willing to develop DSLs for bioinformatics APIs in a >> Smalltalk language workbench (anyone?). > > OK > >> Currently the messages for performing alignments at the NCBI are based >> in the API specification, >> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node9.html . > > The unary >> sends are the result of a plan to reduce parametrization and to >> replicate or customize Blast settings through a UI. This is because >> geneticists experiment changing Blast parameters over time and I want my >> system not to be tied to textual parameters. >> > > >> > The data is kept in XML files and >> > all is read into the image to be queried. It seems that you don't have >> > a problem with the image size? >> >> Yes I had problems with image size and performance, a lot indeed. > >> Actually working with XML DOM with alignments of 5000 or more hits >> Squeak (and Pharo of course) started to show slowliness. So I cannot >> keep all XML nodes in memory. To overcome this problem I've tried the >> SAX (push) parser and the XMLPullParser (which is a StAX parser). Then >> my idea was to reduce the tree by specifying only the XML nodes which >> I'm interested for. After reducing the nodes, I wrote custom XML tree >> classes with a specific API to query blast XML results, taken form the >> DTD specification. AFAIK this is known as a XML digester, which is >> somewhat "evolved" in Java >> (http://commons.apache.org/digester/xmlrules.html). > > I understand that you took > http://www.squeaksource.com/XMLSupport/ > (the XML support repo for Pharo, for Squeak XML support is in > the trunk image) > and modified it. > >> I have built a >> dynamic query builder in Morphic for querying the XML providing the >> possibility of persist and update the filters. Unfortunately for Squeak >> users I'm using the Polymorph API, which I think is not available in >> Squeak. > > A screen shot would be appreciated... :-) > Ok, the blog post includes some screenshots. >> We worked using the XML push/pull parsers for reading genomes and they >> worked acceptably. But it is impossible to keep nodes for 3 GBytes of >> XML at least for now in Squeak/Pharo. > > According to my experience keeping XML structures in the image is > inefficient in terms of memory usage. More efficient ways are needed > and XML is then only for reading/writing to external files. > Exactly, XML is not good at all for big data. >> More and critical problems arise when trying to work with microarray >> data (big data) in Smalltalk which is not document-oriented. I had to >> switch to "solutions" like SQL, or HDF5 using Pytables with >> well-designed scheme for our input. The advantages are that supports >> indexing and reading data in blocks, besides tools like Vitables or >> HDFView to navigate the data. Until someone provides some bits in this >> field, there is little opportunity for using Smalltalk. > > But what I understand is that people keep DNA data in memory for speed > reasons and use C++ or Perl programs to deal with it. > It really depends of the type of analysis, I've seen most starter bioinformaticians prefer Python over Perl because of the nicer syntax and more complete library support. I don't know big data projects using C++ with raw DNA data. Compression with indexing, and specialized file formats are used these days, splitting data in clusters where needed. I would love to see some Smalltalkers working on dataspaces too. See these presentations: http://www.slideshare.net/mndoci/presentations >>> I would welcome a short writeup with a general introduction to what >>> you are doing in http://biosmalltalk.blogspot.com/. > > >> >> We have submitted a paper recently and we are waiting for the review >> results. On the other side we are preparing another paper for a >> phylogenetics decision support system which includes text-mining and a >> rule engine. I will try to write an entry in the next week with >> screenshots. > > Any news on this? > No news so far, still in the reviewing process. Best regards, Hernán > Kind regards > Hannes > > >> Best regards, >> >> Hernán >> >>> Kind regards >>> >>> Hannes Hirzel >>> >>> On 2/1/13, Hernán Morales Durand <[hidden email]> wrote: >>>> Hi, >>>> >>>> Few days ago I created a port of BioSmalltalk for Squeak too. >>>> BioSmalltalk is a library for doing Bioinformatics with Smalltalk. This >>>> port is labelled "BioSqueak" and I expect to release a version for >>>> Windows sometime soon. You can find it in: >>>> >>>> http://code.google.com/p/biosmalltalk/downloads/list >>>> >>>> I'm very interested in feedback. >>>> Thanks for reading. >>>> >>>> Hernán >>>> >>>> -- >>>> Hernán Morales >>>> Institute of Veterinary Genetics (IGEVET) >>>> http://igevet.fcv.unlp.edu.ar >>>> National Scientific and Technical Research Council (CONICET). >>>> La Plata (1900), Buenos Aires, Argentina. >>>> Telephone: +54 (0221) 421-1799. >>>> Internal: 422 >>>> Fax: 425-7980 or 421-1799. >>>> >>>> >>> >>> >> >> >> > > |
Free forum by Nabble | Edit this page |