idea topology and databases

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

idea topology and databases

Paul Sheldon-2
Well, I extremely tersely asked a speaker on Filemaker Pro if
distributed hash tables came into his view and he said that was hidden
in the machine he had bought and used . I didn't dare mention
distributed garbage collection .

I was afraid to ask if he had semantic parsers to natural language
interface into his database or better built them himself . That would
have been perceived as my showing off and being off topic .

His business marketbase might not really approve of wondering how ideas
are connected scientifically and maybe he has given himself over to not
being inquisitive . In a different context, natural language questions
aren't an embarrassment but a natural activity .

I think I saw semantics or at least a parser with a grammar (rather than
just taking dictation) was in voice recognition framework in os x. That
was an exciting relief to crusty market to bottom line business
dismissal (we're going to sell someone else's software to users rather
than wonder how software is made or designed) .

I have downloaded Fleury's pdf files to speedread. Since I'm not steeped
in the subject, they might not inspire me, just let me know a bit what
is going on and be able to listen better to what is going on , though I
know this his heavy knowledge navigator stuff.
Reply | Threaded
Open this post in threaded view
|

Re: idea topology and databases

Les Howell
On Sat, 2007-05-12 at 13:46 -0500, Paul Sheldon wrote:

> Well, I extremely tersely asked a speaker on Filemaker Pro if
> distributed hash tables came into his view and he said that was hidden
> in the machine he had bought and used . I didn't dare mention
> distributed garbage collection .
>
> I was afraid to ask if he had semantic parsers to natural language
> interface into his database or better built them himself . That would
> have been perceived as my showing off and being off topic .
>
> His business marketbase might not really approve of wondering how ideas
> are connected scientifically and maybe he has given himself over to not
> being inquisitive . In a different context, natural language questions
> aren't an embarrassment but a natural activity .
>
> I think I saw semantics or at least a parser with a grammar (rather than
> just taking dictation) was in voice recognition framework in os x. That
> was an exciting relief to crusty market to bottom line business
> dismissal (we're going to sell someone else's software to users rather
> than wonder how software is made or designed) .
>
> I have downloaded Fleury's pdf files to speedread. Since I'm not steeped
> in the subject, they might not inspire me, just let me know a bit what
> is going on and be able to listen better to what is going on , though I
> know this his heavy knowledge navigator stuff.
I read both the papers, and they address specifically means and
algorithm of searches, and while the word semantic is in the papers,
their use of semantic is in reference to a centroid of the network, and
uses an algorithm to calculate what they refer to as "semantic
distance", which deals with the relevance of the data to the words in
the search engine.  At least that is what my tiny mind interpreted from
the papers.  If I am wrong, I hope someone will point out my error.

    History, context, semantics and perception of meaning are vital to
human discourse.  The processes of reducing these to some algorithm is
difficult, and currently seems to be separate, at least in the minds of
developers, which gives us lots of "almost there" interfaces (a great
improvement from "not even close" in the 80's.)  Yet there is another
aspect of the process, that of presentation, and in presentation, that
of human capacity, sort of an ergonomics of human understanding or
capacity.  

    These search engines deal with organizing the search space for speed
of delivery, and in p2p (peer to peer, such as the music sharing sites,
or the social sites, and even croquet spaces) the best and fastest
search algorithm is one that runs the searches in parallel hitting some
optimum number of immediate peers with some form of passing that search
to another group, where the groups are typically referred to as clusters
(bear with me, I know that many of you understand this, but this serves
to help me set my thought and stage later stuff.)  Such algorithms are
most efficient when the search depth remains short, and the number of
active searches is high, thus latency is low and response fast.  But
this means a physical representation would be something like a hallway
with 10,000 doors, and the casual human browser would be overwhelmed by
the options.  Ergonomics demands that the presentation be smaller, and
that it be structured for best relevance to history, context, natural
language semantics, and hopefully some degree of perception.  

    Perhaps a good visual might be an "Oracle" or "Teacher", where the
participant could ask questions, the search algorithm could run and a
count of retrieved responses could be used to determine available
presentation level (say 6 or so, like the points of a good
presentation).  If the search was too broad, the number of returned
points would be great, and the "Oracle" would ask more questions and
restate the search over the returned locations to reduce the noise
before presenting options to the user.  There should be some limit to
this interaction, and when the limit is reached, the available locations
would be prioritized and the first N locations presented.  If these were
insufficient, then another round of searches would be initiated against
the remaining data set, or a new dataset could be generated.  

    This is just one thought, and I am not really happy with it.

    I am by nature a browser of information, and a user both.  Thus
there are times when I want a truely directed search, returning items
that really match my criteria.  Amazon.com or Ebay is not an answer, but
another place to browse for example.  If I want data on an electronic
device, say ADC0808, then I type into a search engine "datasheet on
ADC0809" should return the datasheets from the various manufacturers of
that part (forgive me National).  On the other hand if I am looking to
learn about new converter technologies, I might search for "data and
algorithms for analog to digital converters", where Amazon.com may have
a book that I should read, or Ebay may have some books for sale, and the
papers of the IEEE or ACM or the Robotics Institute (if it exists) might
have information relevant to the issue.  However a good search assistant
would ask "do you want to include commercial sites for books?" and I
could answer yes or no to aid in providing the desired information.  If
the search returned far too many sites, it might ask "do you want to see
sales sites for converters" and I could answer no.
Thus it could help filter the data.  Yes, I could put all the filter
information into the search query, but when I have to do that there is
data I might overlook or relationships that I might not realize without
reading the data from the site.  I would like the systems to help
restore balance.  I do not want to see Brittany Spears in my search
results, thank you.  This is not something that will be
implemented immediately, but will take evolution on a computer scale and
via open development to bring about.

Then once the information has parsed down to a reasonable number of
potential targets, how would those be presented?  And importantly how
fast will this process be?

Am I making any sense at all?

Regards,
Les H

Reply | Threaded
Open this post in threaded view
|

Re: [OT] idea topology and databases

Chris Muller-3
Makes sense to me.  I put together a prototype (included with the
"Maui" project downloadable from SqueakSource) called
"ContextualSearch" that lets the user aggregate any number and kind of
search stores into a Composite of the same API.  Like Google Co-Op
within Squeak.  Included are "CodeElementContext" and
"FileSystemContext", but an additional hierarchy "WebServiceContext"
would be most useful.

Specifying keywords signals the results batched according to the
degree of match (all-match, left-match, any-match).  The result-set
object is, itself, an independently searchable context of the same API
that you can later #refresh.

In addition to the issue of what information-stores to search, the
actual execution of the search should be well-considered too,
particularly if all results cannot be presented instantaneously.
ContextualSearch presents "results as-you-go", meaning simply it
should search in the background and signal partial-results as it finds
them, in addition to progress complete (or at least activity, if the
size of the search-space is unknown, as is frequently the case).

I, too, am a proponent of, "let the machine do the work".  And, UI
designers, stop making me constantly scan lists of hundreds of items
and, please, leverage the power of all available input devices, not
funneling everything through left-clicking the mouse..



On 5/13/07, Les <[hidden email]> wrote:

> On Sat, 2007-05-12 at 13:46 -0500, Paul Sheldon wrote:
> > Well, I extremely tersely asked a speaker on Filemaker Pro if
> > distributed hash tables came into his view and he said that was hidden
> > in the machine he had bought and used . I didn't dare mention
> > distributed garbage collection .
> >
> > I was afraid to ask if he had semantic parsers to natural language
> > interface into his database or better built them himself . That would
> > have been perceived as my showing off and being off topic .
> >
> > His business marketbase might not really approve of wondering how ideas
> > are connected scientifically and maybe he has given himself over to not
> > being inquisitive . In a different context, natural language questions
> > aren't an embarrassment but a natural activity .
> >
> > I think I saw semantics or at least a parser with a grammar (rather than
> > just taking dictation) was in voice recognition framework in os x. That
> > was an exciting relief to crusty market to bottom line business
> > dismissal (we're going to sell someone else's software to users rather
> > than wonder how software is made or designed) .
> >
> > I have downloaded Fleury's pdf files to speedread. Since I'm not steeped
> > in the subject, they might not inspire me, just let me know a bit what
> > is going on and be able to listen better to what is going on , though I
> > know this his heavy knowledge navigator stuff.
> I read both the papers, and they address specifically means and
> algorithm of searches, and while the word semantic is in the papers,
> their use of semantic is in reference to a centroid of the network, and
> uses an algorithm to calculate what they refer to as "semantic
> distance", which deals with the relevance of the data to the words in
> the search engine.  At least that is what my tiny mind interpreted from
> the papers.  If I am wrong, I hope someone will point out my error.
>
>     History, context, semantics and perception of meaning are vital to
> human discourse.  The processes of reducing these to some algorithm is
> difficult, and currently seems to be separate, at least in the minds of
> developers, which gives us lots of "almost there" interfaces (a great
> improvement from "not even close" in the 80's.)  Yet there is another
> aspect of the process, that of presentation, and in presentation, that
> of human capacity, sort of an ergonomics of human understanding or
> capacity.
>
>     These search engines deal with organizing the search space for speed
> of delivery, and in p2p (peer to peer, such as the music sharing sites,
> or the social sites, and even croquet spaces) the best and fastest
> search algorithm is one that runs the searches in parallel hitting some
> optimum number of immediate peers with some form of passing that search
> to another group, where the groups are typically referred to as clusters
> (bear with me, I know that many of you understand this, but this serves
> to help me set my thought and stage later stuff.)  Such algorithms are
> most efficient when the search depth remains short, and the number of
> active searches is high, thus latency is low and response fast.  But
> this means a physical representation would be something like a hallway
> with 10,000 doors, and the casual human browser would be overwhelmed by
> the options.  Ergonomics demands that the presentation be smaller, and
> that it be structured for best relevance to history, context, natural
> language semantics, and hopefully some degree of perception.
>
>     Perhaps a good visual might be an "Oracle" or "Teacher", where the
> participant could ask questions, the search algorithm could run and a
> count of retrieved responses could be used to determine available
> presentation level (say 6 or so, like the points of a good
> presentation).  If the search was too broad, the number of returned
> points would be great, and the "Oracle" would ask more questions and
> restate the search over the returned locations to reduce the noise
> before presenting options to the user.  There should be some limit to
> this interaction, and when the limit is reached, the available locations
> would be prioritized and the first N locations presented.  If these were
> insufficient, then another round of searches would be initiated against
> the remaining data set, or a new dataset could be generated.
>
>     This is just one thought, and I am not really happy with it.
>
>     I am by nature a browser of information, and a user both.  Thus
> there are times when I want a truely directed search, returning items
> that really match my criteria.  Amazon.com or Ebay is not an answer, but
> another place to browse for example.  If I want data on an electronic
> device, say ADC0808, then I type into a search engine "datasheet on
> ADC0809" should return the datasheets from the various manufacturers of
> that part (forgive me National).  On the other hand if I am looking to
> learn about new converter technologies, I might search for "data and
> algorithms for analog to digital converters", where Amazon.com may have
> a book that I should read, or Ebay may have some books for sale, and the
> papers of the IEEE or ACM or the Robotics Institute (if it exists) might
> have information relevant to the issue.  However a good search assistant
> would ask "do you want to include commercial sites for books?" and I
> could answer yes or no to aid in providing the desired information.  If
> the search returned far too many sites, it might ask "do you want to see
> sales sites for converters" and I could answer no.
> Thus it could help filter the data.  Yes, I could put all the filter
> information into the search query, but when I have to do that there is
> data I might overlook or relationships that I might not realize without
> reading the data from the site.  I would like the systems to help
> restore balance.  I do not want to see Brittany Spears in my search
> results, thank you.  This is not something that will be
> implemented immediately, but will take evolution on a computer scale and
> via open development to bring about.
>
> Then once the information has parsed down to a reasonable number of
> potential targets, how would those be presented?  And importantly how
> fast will this process be?
>
> Am I making any sense at all?
>
> Regards,
> Les H
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [OT] idea topology and databases

Florent THIERY-2
Hello,

About semantics: will islands be "tagged" (blogosphere method)? Or is
it a question of pure content (good-old google method)? (or a mixed
one)?

For"ContextualSearch", i justed wanted to submit a probably
irrealistic or silly idea that i had recently: what about assisted
search, or assisted data dissemination? Using an external API like
dbpedia or freebase (when/if it comes to light), you could accelerate
search, give content a semantic consistence. Did you investigate
(mentally :p) this  option?

Assisted search would work like: if you look for something, the
"guide" would ask you which category you want (ex: query wikipedia for
a word, you might have different categories), i.e. digg deeper after a
lookup.

The same goes for the other way too: if the request is too precise and
no direct neighbour has an answear, then reemit a request that is more
general (upper semantic level, obtained by DB lookup), until you find
somebody that has a results (i.e. cached metadata that matches); find
the specialist, you find the specialties.

People that have things in common have a higher statistical
probability to have been to the same places (if places detain
knowledge), they form a social network; maybe these types of human
behaviour could be applied to a data dissemination / caching scenario.

I just didn't find publications about assisted p2p search...

Anyway, sorry for interrupting. Thank you

Florent
Reply | Threaded
Open this post in threaded view
|

Re: [OT] idea topology and databases

Paul Sheldon-2
In reply to this post by Paul Sheldon-2
Chris Muller wrote something that heartened me
where I had been hammering at the pdf articles of Florent
with some dispair and trying to write something I was embarrassed about
to inspire others.

A semantic space for checking distance between questions and
optimizing distribution of information over some sort of p2p network
for efficient access doesn't present math that lifts my spirits into
thinking
that I could understand how to do anything.

However, when I think of it helping me to find things to build with
in software code, my brain really starts playing along!

I remember that an enormous amount of disk space for mac xcode and maya
is just for hypertext. That means to use glitz code, one guy has to have
access
to gazillions of man-hours of saved work and digital assets.

When I ask myself "How can I, myself, be of use"
or "How can I use all these others manhours"
I ask myself how are folks going
to build this obscure knowledge navigator.

We are talking of building open employment!