Well, I extremely tersely asked a speaker on Filemaker Pro if
distributed hash tables came into his view and he said that was hidden in the machine he had bought and used . I didn't dare mention distributed garbage collection . I was afraid to ask if he had semantic parsers to natural language interface into his database or better built them himself . That would have been perceived as my showing off and being off topic . His business marketbase might not really approve of wondering how ideas are connected scientifically and maybe he has given himself over to not being inquisitive . In a different context, natural language questions aren't an embarrassment but a natural activity . I think I saw semantics or at least a parser with a grammar (rather than just taking dictation) was in voice recognition framework in os x. That was an exciting relief to crusty market to bottom line business dismissal (we're going to sell someone else's software to users rather than wonder how software is made or designed) . I have downloaded Fleury's pdf files to speedread. Since I'm not steeped in the subject, they might not inspire me, just let me know a bit what is going on and be able to listen better to what is going on , though I know this his heavy knowledge navigator stuff. |
On Sat, 2007-05-12 at 13:46 -0500, Paul Sheldon wrote:
> Well, I extremely tersely asked a speaker on Filemaker Pro if > distributed hash tables came into his view and he said that was hidden > in the machine he had bought and used . I didn't dare mention > distributed garbage collection . > > I was afraid to ask if he had semantic parsers to natural language > interface into his database or better built them himself . That would > have been perceived as my showing off and being off topic . > > His business marketbase might not really approve of wondering how ideas > are connected scientifically and maybe he has given himself over to not > being inquisitive . In a different context, natural language questions > aren't an embarrassment but a natural activity . > > I think I saw semantics or at least a parser with a grammar (rather than > just taking dictation) was in voice recognition framework in os x. That > was an exciting relief to crusty market to bottom line business > dismissal (we're going to sell someone else's software to users rather > than wonder how software is made or designed) . > > I have downloaded Fleury's pdf files to speedread. Since I'm not steeped > in the subject, they might not inspire me, just let me know a bit what > is going on and be able to listen better to what is going on , though I > know this his heavy knowledge navigator stuff. algorithm of searches, and while the word semantic is in the papers, their use of semantic is in reference to a centroid of the network, and uses an algorithm to calculate what they refer to as "semantic distance", which deals with the relevance of the data to the words in the search engine. At least that is what my tiny mind interpreted from the papers. If I am wrong, I hope someone will point out my error. History, context, semantics and perception of meaning are vital to human discourse. The processes of reducing these to some algorithm is difficult, and currently seems to be separate, at least in the minds of developers, which gives us lots of "almost there" interfaces (a great improvement from "not even close" in the 80's.) Yet there is another aspect of the process, that of presentation, and in presentation, that of human capacity, sort of an ergonomics of human understanding or capacity. These search engines deal with organizing the search space for speed of delivery, and in p2p (peer to peer, such as the music sharing sites, or the social sites, and even croquet spaces) the best and fastest search algorithm is one that runs the searches in parallel hitting some optimum number of immediate peers with some form of passing that search to another group, where the groups are typically referred to as clusters (bear with me, I know that many of you understand this, but this serves to help me set my thought and stage later stuff.) Such algorithms are most efficient when the search depth remains short, and the number of active searches is high, thus latency is low and response fast. But this means a physical representation would be something like a hallway with 10,000 doors, and the casual human browser would be overwhelmed by the options. Ergonomics demands that the presentation be smaller, and that it be structured for best relevance to history, context, natural language semantics, and hopefully some degree of perception. Perhaps a good visual might be an "Oracle" or "Teacher", where the participant could ask questions, the search algorithm could run and a count of retrieved responses could be used to determine available presentation level (say 6 or so, like the points of a good presentation). If the search was too broad, the number of returned points would be great, and the "Oracle" would ask more questions and restate the search over the returned locations to reduce the noise before presenting options to the user. There should be some limit to this interaction, and when the limit is reached, the available locations would be prioritized and the first N locations presented. If these were insufficient, then another round of searches would be initiated against the remaining data set, or a new dataset could be generated. This is just one thought, and I am not really happy with it. I am by nature a browser of information, and a user both. Thus there are times when I want a truely directed search, returning items that really match my criteria. Amazon.com or Ebay is not an answer, but another place to browse for example. If I want data on an electronic device, say ADC0808, then I type into a search engine "datasheet on ADC0809" should return the datasheets from the various manufacturers of that part (forgive me National). On the other hand if I am looking to learn about new converter technologies, I might search for "data and algorithms for analog to digital converters", where Amazon.com may have a book that I should read, or Ebay may have some books for sale, and the papers of the IEEE or ACM or the Robotics Institute (if it exists) might have information relevant to the issue. However a good search assistant would ask "do you want to include commercial sites for books?" and I could answer yes or no to aid in providing the desired information. If the search returned far too many sites, it might ask "do you want to see sales sites for converters" and I could answer no. Thus it could help filter the data. Yes, I could put all the filter information into the search query, but when I have to do that there is data I might overlook or relationships that I might not realize without reading the data from the site. I would like the systems to help restore balance. I do not want to see Brittany Spears in my search results, thank you. This is not something that will be implemented immediately, but will take evolution on a computer scale and via open development to bring about. Then once the information has parsed down to a reasonable number of potential targets, how would those be presented? And importantly how fast will this process be? Am I making any sense at all? Regards, Les H |
Makes sense to me. I put together a prototype (included with the
"Maui" project downloadable from SqueakSource) called "ContextualSearch" that lets the user aggregate any number and kind of search stores into a Composite of the same API. Like Google Co-Op within Squeak. Included are "CodeElementContext" and "FileSystemContext", but an additional hierarchy "WebServiceContext" would be most useful. Specifying keywords signals the results batched according to the degree of match (all-match, left-match, any-match). The result-set object is, itself, an independently searchable context of the same API that you can later #refresh. In addition to the issue of what information-stores to search, the actual execution of the search should be well-considered too, particularly if all results cannot be presented instantaneously. ContextualSearch presents "results as-you-go", meaning simply it should search in the background and signal partial-results as it finds them, in addition to progress complete (or at least activity, if the size of the search-space is unknown, as is frequently the case). I, too, am a proponent of, "let the machine do the work". And, UI designers, stop making me constantly scan lists of hundreds of items and, please, leverage the power of all available input devices, not funneling everything through left-clicking the mouse.. On 5/13/07, Les <[hidden email]> wrote: > On Sat, 2007-05-12 at 13:46 -0500, Paul Sheldon wrote: > > Well, I extremely tersely asked a speaker on Filemaker Pro if > > distributed hash tables came into his view and he said that was hidden > > in the machine he had bought and used . I didn't dare mention > > distributed garbage collection . > > > > I was afraid to ask if he had semantic parsers to natural language > > interface into his database or better built them himself . That would > > have been perceived as my showing off and being off topic . > > > > His business marketbase might not really approve of wondering how ideas > > are connected scientifically and maybe he has given himself over to not > > being inquisitive . In a different context, natural language questions > > aren't an embarrassment but a natural activity . > > > > I think I saw semantics or at least a parser with a grammar (rather than > > just taking dictation) was in voice recognition framework in os x. That > > was an exciting relief to crusty market to bottom line business > > dismissal (we're going to sell someone else's software to users rather > > than wonder how software is made or designed) . > > > > I have downloaded Fleury's pdf files to speedread. Since I'm not steeped > > in the subject, they might not inspire me, just let me know a bit what > > is going on and be able to listen better to what is going on , though I > > know this his heavy knowledge navigator stuff. > I read both the papers, and they address specifically means and > algorithm of searches, and while the word semantic is in the papers, > their use of semantic is in reference to a centroid of the network, and > uses an algorithm to calculate what they refer to as "semantic > distance", which deals with the relevance of the data to the words in > the search engine. At least that is what my tiny mind interpreted from > the papers. If I am wrong, I hope someone will point out my error. > > History, context, semantics and perception of meaning are vital to > human discourse. The processes of reducing these to some algorithm is > difficult, and currently seems to be separate, at least in the minds of > developers, which gives us lots of "almost there" interfaces (a great > improvement from "not even close" in the 80's.) Yet there is another > aspect of the process, that of presentation, and in presentation, that > of human capacity, sort of an ergonomics of human understanding or > capacity. > > These search engines deal with organizing the search space for speed > of delivery, and in p2p (peer to peer, such as the music sharing sites, > or the social sites, and even croquet spaces) the best and fastest > search algorithm is one that runs the searches in parallel hitting some > optimum number of immediate peers with some form of passing that search > to another group, where the groups are typically referred to as clusters > (bear with me, I know that many of you understand this, but this serves > to help me set my thought and stage later stuff.) Such algorithms are > most efficient when the search depth remains short, and the number of > active searches is high, thus latency is low and response fast. But > this means a physical representation would be something like a hallway > with 10,000 doors, and the casual human browser would be overwhelmed by > the options. Ergonomics demands that the presentation be smaller, and > that it be structured for best relevance to history, context, natural > language semantics, and hopefully some degree of perception. > > Perhaps a good visual might be an "Oracle" or "Teacher", where the > participant could ask questions, the search algorithm could run and a > count of retrieved responses could be used to determine available > presentation level (say 6 or so, like the points of a good > presentation). If the search was too broad, the number of returned > points would be great, and the "Oracle" would ask more questions and > restate the search over the returned locations to reduce the noise > before presenting options to the user. There should be some limit to > this interaction, and when the limit is reached, the available locations > would be prioritized and the first N locations presented. If these were > insufficient, then another round of searches would be initiated against > the remaining data set, or a new dataset could be generated. > > This is just one thought, and I am not really happy with it. > > I am by nature a browser of information, and a user both. Thus > there are times when I want a truely directed search, returning items > that really match my criteria. Amazon.com or Ebay is not an answer, but > another place to browse for example. If I want data on an electronic > device, say ADC0808, then I type into a search engine "datasheet on > ADC0809" should return the datasheets from the various manufacturers of > that part (forgive me National). On the other hand if I am looking to > learn about new converter technologies, I might search for "data and > algorithms for analog to digital converters", where Amazon.com may have > a book that I should read, or Ebay may have some books for sale, and the > papers of the IEEE or ACM or the Robotics Institute (if it exists) might > have information relevant to the issue. However a good search assistant > would ask "do you want to include commercial sites for books?" and I > could answer yes or no to aid in providing the desired information. If > the search returned far too many sites, it might ask "do you want to see > sales sites for converters" and I could answer no. > Thus it could help filter the data. Yes, I could put all the filter > information into the search query, but when I have to do that there is > data I might overlook or relationships that I might not realize without > reading the data from the site. I would like the systems to help > restore balance. I do not want to see Brittany Spears in my search > results, thank you. This is not something that will be > implemented immediately, but will take evolution on a computer scale and > via open development to bring about. > > Then once the information has parsed down to a reasonable number of > potential targets, how would those be presented? And importantly how > fast will this process be? > > Am I making any sense at all? > > Regards, > Les H > > |
Hello,
About semantics: will islands be "tagged" (blogosphere method)? Or is it a question of pure content (good-old google method)? (or a mixed one)? For"ContextualSearch", i justed wanted to submit a probably irrealistic or silly idea that i had recently: what about assisted search, or assisted data dissemination? Using an external API like dbpedia or freebase (when/if it comes to light), you could accelerate search, give content a semantic consistence. Did you investigate (mentally :p) this option? Assisted search would work like: if you look for something, the "guide" would ask you which category you want (ex: query wikipedia for a word, you might have different categories), i.e. digg deeper after a lookup. The same goes for the other way too: if the request is too precise and no direct neighbour has an answear, then reemit a request that is more general (upper semantic level, obtained by DB lookup), until you find somebody that has a results (i.e. cached metadata that matches); find the specialist, you find the specialties. People that have things in common have a higher statistical probability to have been to the same places (if places detain knowledge), they form a social network; maybe these types of human behaviour could be applied to a data dissemination / caching scenario. I just didn't find publications about assisted p2p search... Anyway, sorry for interrupting. Thank you Florent |
In reply to this post by Paul Sheldon-2
Chris Muller wrote something that heartened me
where I had been hammering at the pdf articles of Florent with some dispair and trying to write something I was embarrassed about to inspire others. A semantic space for checking distance between questions and optimizing distribution of information over some sort of p2p network for efficient access doesn't present math that lifts my spirits into thinking that I could understand how to do anything. However, when I think of it helping me to find things to build with in software code, my brain really starts playing along! I remember that an enormous amount of disk space for mac xcode and maya is just for hypertext. That means to use glitz code, one guy has to have access to gazillions of man-hours of saved work and digital assets. When I ask myself "How can I, myself, be of use" or "How can I use all these others manhours" I ask myself how are folks going to build this obscure knowledge navigator. We are talking of building open employment! |
Free forum by Nabble | Edit this page |