)Hello,
I must confess that I have not RTFM totally, so when my question has been asked or answered before, sorry. I have been using Smalltalk about 25 years ago and still have the books from Goldberg and Lalonde. But during the time I watched but did not actively follow the development. In the last years I switched to Python using also the NLTK. My main problem is the organisation of information in the form of lots of text objects. Here I used heavily Emacs and the org mode and still my favorite: Scrivener (https://www.literatureandlatte.com/scrivener/overview) I am still looking for an integrated environment to write/organize/analyse text. And I am sure that everything is in Pharo and what is missing can be programmed. I understand that Smalltalk is an IDE, but I haven't been pointed to Pharo as a standard desktop. I found Grafoscopio which seemed to me a basis for the work I do, but still haven't found tools for standard text processing/ file management / dictionary lookup etc. And I am still missing/haven't found working examples in the classes, so that if you are unsure what it really stands for, I could start an example and start digging. As an example until now I was not able to import my org files and see what the parser does. So are there some documents where it is explained where to find an editor, markup-tags, so that I can import my text base and can start playing with my text within Pharo and use it also as a working environment. Thanks in advance Hajo --- Cela est bien dit, mais il faut cultiver notre jardin. http://hajos-kontrapunkte.blogspot.de/ |
Hello Hajo,
2018-03-22 14:54 GMT-03:00 Hajo Dezelski <[hidden email]>: > )Hello, > > I must confess that I have not RTFM totally, so when my question has > been asked or answered before, sorry. > > I have been using Smalltalk about 25 years ago and still have the > books from Goldberg and Lalonde. But during the time I watched but did > not actively follow the development. In the last years I switched to > Python using also the NLTK. > > My main problem is the organisation of information in the form of > lots of text objects. Here I used heavily Emacs and the org mode and > still my favorite: Scrivener > (https://www.literatureandlatte.com/scrivener/overview) > > I am still looking for an integrated environment to > write/organize/analyse text. And I am sure that everything is in Pharo > and what is missing can be programmed. > Which kind of text analysis/organization you want to do? NLP? FRBR? There are several options for text processing: There is also NaturalSmalltlak with stemmer, TF-IDF, supervised and unsupervised classifiers, k-means clustering, naive Bayes, etc. I didn't checked but this project https://github.com/mark-watson/nlp_smalltalk claims support for NER, POS, segmentation and summarization. There is Moose-Algos-InformationRetrieval (ex Hapax) with stemmers and corpus support. Maybe you can install it by evaluating: Metacello new configuration: 'MooseAlgos'; smalltalkhubUser: 'Moose' project: 'MooseAlgos'; version: #development; load: 'Moose-Tests-Algos-Graph’ > I understand that Smalltalk is an IDE, but I haven't been pointed to > Pharo as a standard desktop. I found Grafoscopio which seemed to me a > basis for the work I do, but still haven't found tools for standard > text processing/ file management / dictionary lookup etc. > > And I am still missing/haven't found working examples in the classes, > so that if you are unsure what it really stands for, I could start an > example and start digging. As an example until now I was not able to > import my org files and see what the parser does. > > So are there some documents where it is explained where to find an > editor, markup-tags, so that I can import my text base and can start > playing with my text within Pharo and use it also as a working > environment. > If the above doesn't fit your requirements could you comment which type of text do you have? Cheers, Hernán > Thanks in advance > > Hajo > > --- > Cela est bien dit, mais il faut cultiver notre jardin. > > http://hajos-kontrapunkte.blogspot.de/ > |
Thanks Hernán,
for the hint. I will have a look at it. I have a large "database" (in the moment ~ 1 GB) of notes articles in three languages, all in plain text organized thematically in about 40 *.org files. They have partly keywords but mostly I search for lemmata to gather material for new articles, reorganisation of topics, etc. In the last years I additionally added pictures belonging to the text but could do that only in Scrivener. My problem is that I did not stringently applied keywords and lost the overview where I placed text fragments concerning different topics. So I am trying to reorganize this mess in an integrated environment where I can search via my knowledge or NLTK functions. But that's my problem. I have a starting point and when I hit another barrier I will ask more specific. Cheers Hajo Gruss Hajo --- Cela est bien dit, mais il faut cultiver notre jardin. http://hajos-kontrapunkte.blogspot.de/ On Thu, Mar 22, 2018 at 8:35 PM, Hernán Morales Durand <[hidden email]> wrote: > Hello Hajo, > > 2018-03-22 14:54 GMT-03:00 Hajo Dezelski <[hidden email]>: >> )Hello, >> >> I must confess that I have not RTFM totally, so when my question has >> been asked or answered before, sorry. >> >> I have been using Smalltalk about 25 years ago and still have the >> books from Goldberg and Lalonde. But during the time I watched but did >> not actively follow the development. In the last years I switched to >> Python using also the NLTK. >> >> My main problem is the organisation of information in the form of >> lots of text objects. Here I used heavily Emacs and the org mode and >> still my favorite: Scrivener >> (https://www.literatureandlatte.com/scrivener/overview) >> >> I am still looking for an integrated environment to >> write/organize/analyse text. And I am sure that everything is in Pharo >> and what is missing can be programmed. >> > > Which kind of text analysis/organization you want to do? NLP? FRBR? > > There are several options for text processing: > > There is also NaturalSmalltlak with stemmer, TF-IDF, supervised and > unsupervised classifiers, k-means clustering, naive Bayes, etc. > > I didn't checked but this project > https://github.com/mark-watson/nlp_smalltalk claims support for NER, > POS, segmentation and summarization. > > There is Moose-Algos-InformationRetrieval (ex Hapax) with stemmers and > corpus support. > > Maybe you can install it by evaluating: > > Metacello new > configuration: 'MooseAlgos'; > smalltalkhubUser: 'Moose' project: 'MooseAlgos'; > version: #development; > load: 'Moose-Tests-Algos-Graph’ > > > >> I understand that Smalltalk is an IDE, but I haven't been pointed to >> Pharo as a standard desktop. I found Grafoscopio which seemed to me a >> basis for the work I do, but still haven't found tools for standard >> text processing/ file management / dictionary lookup etc. >> >> And I am still missing/haven't found working examples in the classes, >> so that if you are unsure what it really stands for, I could start an >> example and start digging. As an example until now I was not able to >> import my org files and see what the parser does. >> >> So are there some documents where it is explained where to find an >> editor, markup-tags, so that I can import my text base and can start >> playing with my text within Pharo and use it also as a working >> environment. >> > > If the above doesn't fit your requirements could you comment which > type of text do you have? > > Cheers, > > Hernán > >> Thanks in advance >> >> Hajo >> >> --- >> Cela est bien dit, mais il faut cultiver notre jardin. >> >> http://hajos-kontrapunkte.blogspot.de/ >> > |
Hi Hajo,
I have been working with a similar problem: how to organize long complex text and I found that making it *inside* Pharo and program extensions to work with particular agile visualizations that are part of a data narratives is the most powerful and flexible approach, after trying Jupyter, Org, Leo Editor and others. For that, I have created Grafoscopio[1]. You can see how to install and use it and even comparisons with other similar and/or inspiring programs and the gap it's trying to fill in the ecosystem in the User Manual [2]. We have a "local first" approach, so the most updated information is in Spanish at [3][4] (except fo r the User Manual, that is almost updated and was wrote in English). [1] http://mutabit.com/grafoscopio/index.en.html [2] http://mutabit.com/repos.fossil/grafoscopio/doc/tip/Docs/En/Books/Manual/manual.pdf [3] http://mutabit.com/grafoscopio/ [4] http://mutabit.com/dataweek/ Let me know if Grafoscopio works for you. It is my first program and the one I used to learn Pharo, so it has rookie code in many places and some remaining, but its being improved and used actively. Cheers, Offray On 22/03/18 15:43, Hajo Dezelski wrote: > Thanks Hernán, > > for the hint. I will have a look at it. > > I have a large "database" (in the moment ~ 1 GB) of notes articles in > three languages, all in plain text organized thematically in about 40 > *.org files. They have partly keywords but mostly I search for lemmata > to gather material for new articles, reorganisation of topics, etc. In > the last years I additionally added pictures belonging to the text but > could do that only in Scrivener. > > My problem is that I did not stringently applied keywords and lost the > overview where I placed text fragments concerning different topics. So > I am trying to reorganize this mess in an integrated environment where > I can search via my knowledge or NLTK functions. > > But that's my problem. I have a starting point and when I hit another > barrier I will ask more specific. > > Cheers > Hajo > Gruss > Hajo > > --- > Cela est bien dit, mais il faut cultiver notre jardin. > > http://hajos-kontrapunkte.blogspot.de/ > > > On Thu, Mar 22, 2018 at 8:35 PM, Hernán Morales Durand > <[hidden email]> wrote: >> Hello Hajo, >> >> 2018-03-22 14:54 GMT-03:00 Hajo Dezelski <[hidden email]>: >>> )Hello, >>> >>> I must confess that I have not RTFM totally, so when my question has >>> been asked or answered before, sorry. >>> >>> I have been using Smalltalk about 25 years ago and still have the >>> books from Goldberg and Lalonde. But during the time I watched but did >>> not actively follow the development. In the last years I switched to >>> Python using also the NLTK. >>> >>> My main problem is the organisation of information in the form of >>> lots of text objects. Here I used heavily Emacs and the org mode and >>> still my favorite: Scrivener >>> (https://www.literatureandlatte.com/scrivener/overview) >>> >>> I am still looking for an integrated environment to >>> write/organize/analyse text. And I am sure that everything is in Pharo >>> and what is missing can be programmed. >>> >> Which kind of text analysis/organization you want to do? NLP? FRBR? >> >> There are several options for text processing: >> >> There is also NaturalSmalltlak with stemmer, TF-IDF, supervised and >> unsupervised classifiers, k-means clustering, naive Bayes, etc. >> >> I didn't checked but this project >> https://github.com/mark-watson/nlp_smalltalk claims support for NER, >> POS, segmentation and summarization. >> >> There is Moose-Algos-InformationRetrieval (ex Hapax) with stemmers and >> corpus support. >> >> Maybe you can install it by evaluating: >> >> Metacello new >> configuration: 'MooseAlgos'; >> smalltalkhubUser: 'Moose' project: 'MooseAlgos'; >> version: #development; >> load: 'Moose-Tests-Algos-Graph’ >> >> >> >>> I understand that Smalltalk is an IDE, but I haven't been pointed to >>> Pharo as a standard desktop. I found Grafoscopio which seemed to me a >>> basis for the work I do, but still haven't found tools for standard >>> text processing/ file management / dictionary lookup etc. >>> >>> And I am still missing/haven't found working examples in the classes, >>> so that if you are unsure what it really stands for, I could start an >>> example and start digging. As an example until now I was not able to >>> import my org files and see what the parser does. >>> >>> So are there some documents where it is explained where to find an >>> editor, markup-tags, so that I can import my text base and can start >>> playing with my text within Pharo and use it also as a working >>> environment. >>> >> If the above doesn't fit your requirements could you comment which >> type of text do you have? >> >> Cheers, >> >> Hernán >> >>> Thanks in advance >>> >>> Hajo >>> >>> --- >>> Cela est bien dit, mais il faut cultiver notre jardin. >>> >>> http://hajos-kontrapunkte.blogspot.de/ >>> > |
Hi again Hajo,
I just saw that you mention Grafoscopio in your first thread's mail and yes, you are right, we lack of the tools for text management you are looking for (dictionaries, and text processing). We have preliminary support to refer external files via node links, but if you already have such database in Org Mode Files, maybe you would like to create an importer to Grafoscopio and extend it to suit your needs. BTW, Leo Editor[1] supports importing from Org, AFAIK, and has good integration with Python, being a pure Python program, so maybe you could start there to support NLTK. That being said, the customization and live coding and visualization capabilities in the Pharo ecosystem are unbeatable when you are trying to suit your own needs. [1] http://leoeditor.com/ Cheers, Offray On 22/03/18 17:42, Offray Vladimir Luna Cárdenas wrote: > Hi Hajo, > > I have been working with a similar problem: how to organize long complex > text and I found that making it *inside* Pharo and program extensions to > work with particular agile visualizations that are part of a data > narratives is the most powerful and flexible approach, after trying > Jupyter, Org, Leo Editor and others. For that, I have created > Grafoscopio[1]. You can see how to install and use it and even > comparisons with other similar and/or inspiring programs and the gap > it's trying to fill in the ecosystem in the User Manual [2]. We have a > "local first" approach, so the most updated information is in Spanish at > [3][4] (except fo r the User Manual, that is almost updated and was > wrote in English). > > [1] http://mutabit.com/grafoscopio/index.en.html > [2] > http://mutabit.com/repos.fossil/grafoscopio/doc/tip/Docs/En/Books/Manual/manual.pdf > [3] http://mutabit.com/grafoscopio/ > [4] http://mutabit.com/dataweek/ > > Let me know if Grafoscopio works for you. It is my first program and the > one I used to learn Pharo, so it has rookie code in many places and some > remaining, but its being improved and used actively. > > Cheers, > > Offray > > On 22/03/18 15:43, Hajo Dezelski wrote: >> Thanks Hernán, >> >> for the hint. I will have a look at it. >> >> I have a large "database" (in the moment ~ 1 GB) of notes articles in >> three languages, all in plain text organized thematically in about 40 >> *.org files. They have partly keywords but mostly I search for lemmata >> to gather material for new articles, reorganisation of topics, etc. In >> the last years I additionally added pictures belonging to the text but >> could do that only in Scrivener. >> >> My problem is that I did not stringently applied keywords and lost the >> overview where I placed text fragments concerning different topics. So >> I am trying to reorganize this mess in an integrated environment where >> I can search via my knowledge or NLTK functions. >> >> But that's my problem. I have a starting point and when I hit another >> barrier I will ask more specific. >> >> Cheers >> Hajo >> Gruss >> Hajo >> >> --- >> Cela est bien dit, mais il faut cultiver notre jardin. >> >> http://hajos-kontrapunkte.blogspot.de/ >> >> >> On Thu, Mar 22, 2018 at 8:35 PM, Hernán Morales Durand >> <[hidden email]> wrote: >>> Hello Hajo, >>> >>> 2018-03-22 14:54 GMT-03:00 Hajo Dezelski <[hidden email]>: >>>> )Hello, >>>> >>>> I must confess that I have not RTFM totally, so when my question has >>>> been asked or answered before, sorry. >>>> >>>> I have been using Smalltalk about 25 years ago and still have the >>>> books from Goldberg and Lalonde. But during the time I watched but did >>>> not actively follow the development. In the last years I switched to >>>> Python using also the NLTK. >>>> >>>> My main problem is the organisation of information in the form of >>>> lots of text objects. Here I used heavily Emacs and the org mode and >>>> still my favorite: Scrivener >>>> (https://www.literatureandlatte.com/scrivener/overview) >>>> >>>> I am still looking for an integrated environment to >>>> write/organize/analyse text. And I am sure that everything is in Pharo >>>> and what is missing can be programmed. >>>> >>> Which kind of text analysis/organization you want to do? NLP? FRBR? >>> >>> There are several options for text processing: >>> >>> There is also NaturalSmalltlak with stemmer, TF-IDF, supervised and >>> unsupervised classifiers, k-means clustering, naive Bayes, etc. >>> >>> I didn't checked but this project >>> https://github.com/mark-watson/nlp_smalltalk claims support for NER, >>> POS, segmentation and summarization. >>> >>> There is Moose-Algos-InformationRetrieval (ex Hapax) with stemmers and >>> corpus support. >>> >>> Maybe you can install it by evaluating: >>> >>> Metacello new >>> configuration: 'MooseAlgos'; >>> smalltalkhubUser: 'Moose' project: 'MooseAlgos'; >>> version: #development; >>> load: 'Moose-Tests-Algos-Graph’ >>> >>> >>> >>>> I understand that Smalltalk is an IDE, but I haven't been pointed to >>>> Pharo as a standard desktop. I found Grafoscopio which seemed to me a >>>> basis for the work I do, but still haven't found tools for standard >>>> text processing/ file management / dictionary lookup etc. >>>> >>>> And I am still missing/haven't found working examples in the classes, >>>> so that if you are unsure what it really stands for, I could start an >>>> example and start digging. As an example until now I was not able to >>>> import my org files and see what the parser does. >>>> >>>> So are there some documents where it is explained where to find an >>>> editor, markup-tags, so that I can import my text base and can start >>>> playing with my text within Pharo and use it also as a working >>>> environment. >>>> >>> If the above doesn't fit your requirements could you comment which >>> type of text do you have? >>> >>> Cheers, >>> >>> Hernán >>> >>>> Thanks in advance >>>> >>>> Hajo >>>> >>>> --- >>>> Cela est bien dit, mais il faut cultiver notre jardin. >>>> >>>> http://hajos-kontrapunkte.blogspot.de/ >>>> > > > |
In reply to this post by Hajo Dezelski
I came across this today. Maybe related.
cheers -ben On 23 March 2018 at 01:54, Hajo Dezelski <[hidden email]> wrote: )Hello, |
Hello,
thanks to all for the pointers you offered. They were more than helpfull. So I will give it a try: Grafiscopio seems to be a very good tool to get started with my project . So I will focus first to get my text database into the image. I have to convert the *.org files to *.ston. A way could be via the Leo-editor and Python. But I have to study the *.ston datastructure, to see what is in the box. When this is done I will explore Syre and the Moose-Algos Information Retrieval. But this is a long way to go. I will report. Thanks again and have a nice weekend. Cheers Hajo --- Cela est bien dit, mais il faut cultiver notre jardin. http://hajos-kontrapunkte.blogspot.de/ |
In reply to this post by Hajo Dezelski
Hajo Dezelski <[hidden email]> wrote:
> My main problem is the organisation of information in the form of > lots of text objects. Here I used heavily Emacs and the org mode and > still my favorite: Scrivener > (https://www.literatureandlatte.com/scrivener/overview) > > I am still looking for an integrated environment to > write/organize/analyse text. And I am sure that everything is in Pharo > and what is missing can be programmed. Yes, this should be reasonably doable. You could store the text fragments in git, build extra data structures, browsers and editors on demand or also persist them. You might want to take a look at Ward Cunningham’s work on federated wiki for collaboration. You can find some GUI experiments I did for large screens on https://vimeo.com/139960287 Stephan |
In reply to this post by Hajo Dezelski
Hi,
I would start by creating a single .org document that contains all the key attributes you want to preserve from your current database to the one that would be inside of Pharo. Then I would try to import it to Leo and save it as a .leo file (which is just XML). Then I doulw use the XML reading tools in Pharo to explore such document and to see which elements can be stored into Grafoscopio. I think that most of the scaffolding is already there: nodes with headers, tags, body and parent/children relationship. Also Ston already provides what you need for a simple storage outside the image for your new imported documents. We have complex books, like the Data Driven Journalist Handbook[1], contained in a single Grafoscopio notebook, occupying just 600Kb. [1] http://mutabit.com/repos.fossil/mapeda/ After that I would to to Syre and Moose, following Ben's timely and wise advices, as usual. Cheers, Offray On 23/03/18 06:51, Hajo Dezelski wrote: > Hello, > > thanks to all for the pointers you offered. They were more than helpfull. > > So I will give it a try: > > Grafiscopio seems to be a very good tool to get started with my project > . > So I will focus first to get my text database into the image. I have > to convert the *.org files to *.ston. A way could be via the > Leo-editor and Python. But I have to study the *.ston datastructure, > to see what is in the box. > > When this is done I will explore Syre and the Moose-Algos Information > Retrieval. But this is a long way to go. > > I will report. > > Thanks again and have a nice weekend. > > Cheers > Hajo > > --- > Cela est bien dit, mais il faut cultiver notre jardin. > > http://hajos-kontrapunkte.blogspot.de/ > > |
Free forum by Nabble | Edit this page |