Smalltalk › Squeak › Squeak - Dev

[squeak-dev] Beyond email, the Social Semantic Desktop

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

1 message

Paul D. Fernhout

[squeak-dev] Beyond email, the Social Semantic Desktop

Ken wrote:
> I agree that for now we should simply move on but perhaps give a little
> thought to what might be a workable alternative if this does turn into a
> real problem in the future.

The recent spam with forged senders to the Squeak list
http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-January/133725.html
suggests public email lists are going to be less and less useful.

I know from one email forged to appear to be from me on one mailing list
related to Doug Engelbart's Unrev-II colloquium (many years ago), how
unsettling that can be for the one whose email address gets forged.
Ultimately such things probably cannot be prevented entirely (security is
never perfect), but such events can certainly be made less annoying and much
rarer.

One might ask how Squeak or similar systems could help with that.

Filtering at one point will just never be perfect. One issue with email
(unlike web forums) is that the community can't easily tag information
*after* it is sent out in a machine-readable format, like as "forged" or
even just "boring" or "interesting". One can even imagine tagging email
information after it is sent with a complex set of information like Slashdot
does, say, related to moderation by various human moderators, then
interpreted locally in your email client using different moderator
reputations you yourself or someone you trust supplies. Perhaps you might
only read emails to the Squeak list that Tim Rowledge marked as VM-related.
:-) Or, one can imagine better ways of splitting up long emails into a
variety of topics after they were sent and creating better links between the
ideas. So, if we have a way to tag information after it is sent out to a
bunch of people, we can work collaboratively and stigmergically to build
information into knowledge (or an approximation thereof). This is in some
sense how the first email-like system by Doug Engelbart worked with Augment,
although done on only one central machine. There are some commercial systems
that do support shared distributed workspaces for communications, but we all
know a dynamic and open source language platform should be able to support
this better. :-) Wikis like Ward Cunningham invented in Smalltalk already
work this way in a sense (continued modifications after the first post), but
they are generally not distributed and wikis are generally restricted to a
textual approach (although "semantic wikis" are emerging, like Semantic
MediaWiki).

Distributed systems like email are really good approaches for a lot of
reasons like by minimizing bottlenecks if everyone used the same server for
reading a web forum or by creating redundancy by everyone having a local
copy of each message after it is sent. So, the basic technical idea of email
is not that bad in terms of being distributed (likewise Usenet was a really
good idea in some ways to move stuff around in a distributed way). But email
is showing its age because email is still perceived mostly as being about
sending free form text, so there is no obvious way to hook tagging into it.
And of course HTML mail was a step in the wrong direction because it didn't
address the general problem of sending complex information people could
easily collaborate to enhance it, while at the same time HTML mail made
plain text email more difficult to use. But it still showed one can build
stuff on top of email, as does the notion of MIME types and attachments. We
still could use something better, and many agree, but it seems like an
impossible task to upgrade email and not worth the bother. Backwards
compatibility is nice, so one can imagine a new system would ideally need to
support conventional email, perhaps adding extra tagging information inline
or in attachments. But that all still seems more complicated than it is
worth. But what if there were other good reasons to switch to a new
communications platform?

I'd suggest the future may not be so much in reinventing mailing lists and
email with authentication, but rather more in moving entirely to a new
multi-purpose distributed paradigm like the "Social Semantic Desktop" (SSD)
idea:
http://www.semanticdesktop.org
From there: "The Internet, electronic mail, and the Web have revolutionized
the way we communicate and collaborate - their mass adoption is one of the
major technological success stories of the 20th century. We all are now much
more connected, and in turn face new resulting problems: information
overload caused by insufficient support for information organization and
collaboration. For example, sending a single file to a mailing list
multiplies the cognitive processing effort of filtering and organizing this
file times the number of recipients -- leading to more and more of peoples'
time going into information filtering and information management activities.
There is a need for smarter and more fine-grained computer support for
personal and networked information that has to blend the boundaries between
personal and group data, while simultaneously safeguarding privacy and
establishing and deploying trust among collaborators."

That page goes on to provide more details including: "P2P and Grid
computing, especially in combination with the Semantic Web field, develops
technology to interconnect large communities without centralized
infrastructures for data and computation sharing, which is necessary to
build heterogeneous, multi-organizational collaboration networks."

You could imagine a Social Semantic Desktop from one point of view as being
like a really smart email client, where you could send out emails to mailing
lists saying how to tag previous emails. But, one can imagine supporting all
sorts of backends besides email to let small workgroups communicate in
various ways. And one can imagine having all sorts of applications running
on top of this distributed infrastructure (my wife and I have a couple new
ones in mind myself related to manufacturing and storytelling). Squeak does
have distributed applications like Croquet,
http://en.wikipedia.org/wiki/Croquet_project
but I'm talking about a more common infrastructure for all these ideas.
Perhaps the TeaTime TObject idea, while great for what it does with Croquet,
is not the only approach to abstraction for the general problem of
collaborative work on information?

I have some notions on this myself which I have been pursuing in Jython for
the JVM:
http://sourceforge.net/projects/pointrel/
Sorry it's not in Squeak, although earlier versions of that code before the
SSD focus from years back were for Squeak. Essentially, I've been working on
a triple store called Pointrel on-and-off for about a quarter century. That
work even predates WordNet which was in a tiny way perhaps inspired by it.
Why isn't it done yet? Good question. :-)

I'd suggest that using Squeak as the basis for a similar or better Social
Semantic Desktop might be the day that Squeak conquers the world. :-) Or at
least a bigger part of it than it has already with Seaside, Croquet,
Scratch, etc.. :-)

Maybe it could be done using TeaTime, but here is another possibility coming
at this situation from a different perspective.

Here is a rough idea of what I think is a good architecture for a Social
Semantic Desktop and which I am working towards right now. Although I'm
working in Jython to leverage Java, obviously anybody could work on in
Squeak as friendly co-opetition, and so people are welcome to sign up for
the Pointrel SSD mailing list on SourceForge and use it as a place to bounce
around Squeak-related version ideas for now. That list:
http://sourceforge.net/mailarchive/forum.php?forum_name=pointrel-discuss
Obviously, the hope is to replace that list using the system itself. :-)
So, it's just for bootstrapping. :-)

Basic implementation ideas:

* Internally, information is stored in the equivalent of RDF triples.
http://en.wikipedia.org/wiki/Resource_Description_Framework
I'm using a variant of a triple with a context field as well, like NEPOMUK does:
http://nepomuk.semanticdesktop.org
(NEPOMUK is a SSD attempt for KDE.) Triples are a general purpose way to
store information, essentially just saying how digital objects link
together. One might expect there would be ways to place Smalltalk objects
into sets of triples (or even just strings) and get them back out again.
Each of the four fields I'm using has both a data field an a namespace
describing how to interpret the data. Using the RDF naming convention, there
are "subject", "predicate", and "object" fields. (I actually prefer
"object", "attribute", "value" which are more OO-like.) The fourth
"context" field is sort-of equivalent to a file type and file name. These
triples are defined in approach I am currently taking using "transactions"
which are sets of triples to add or remove from the triple store all together.

* Objects are essentially defined by these triples, and their complete
history is implicit in the list of all transactions which is stored on disk
like a Smalltalk changes file (although possibly transactions might be
stored as lots of little files with UUID names until they are periodically
joined together in a larger file). Storing the entire history of the shared
database is wasteful in some ways, but disk space is cheap these days;
likely the whole history will be in memory as well (and that's how I'm doing
the Jython implementation). For any implementation, the RDF database is
either entirely in memory (easy for the first try and coding it yourself) or
on-disk and cached. There are multiple RDF triple-stores that do this
already on disk with a memory cache in efficient ways, including with query
languages etc., though not to my knowledge with these distributed
transaction files like I mention below, though maybe one does?

* Applications like email, shared to-do lists, virtual worlds, shared eToys,
source code control systems, etc. would then all run on top of this
distributed system. Not every client might have all transactions, in which
case their data may have inconsistencies, but applications should be written
to be forgiving of this. :-) Or, application that are less forgiving would
require using more exacting backends than email lists (like a central
database or a IRC-like system or TeaTime for real-time low-latency
applications). Essentially, different distributed databases might have
different preferred coordination backends depending on the need for
reliability or low-latency or public accessibility. This system could
potentially replace the Squeak changes file and sources file and various
Squeak source control systems someday, like say if each change to the system
was written out as a transaction in an XML file. (I've also implemented an
OO-like system purely on triples that supported a crude Smalltalk-like
environment as an experiment a few years back, but that's another issue; for
now I just see this approach as a flexible shared database which transmits
the data usually in objects, but I point that proof-of-concept out to show
how general purpose triples can be.)

* Information could be exchanged as "transactions" that are distributed in
some way like via mailing list email attachments, source code control system
like git, CGI-based system, remote database, WebDAV, TeaTime, or whatever
backends one had access to for transmitting these transaction files (or the
equivalent). I have not implemented this part yet, as the system I have so
far just uses one big file and for shared use can redirect changes to that
file through a CGI script which is somewhat like a mailing list archive that
clients can consult to get all the recent changes they don't have yet. These
transactions could have digital signatures to make forgeries more difficult
(though I have not implemented that either). Each transaction is mainly a
list of triple additions or deletions which modify the triple store (with a
timestamp and author and licenses granted for each action -- in general the
system now goes overboard with explicit license granting). Each transaction
also has a timestamp and the file also specifies which distributed database
it goes with (via a UUID). (It might make sense to have more than one
transaction in a file perhaps.) Probably XML is an OK choice for these
transaction files, which in a sense represent complex messages about how to
change the state of a triple store. I currently use a different plain text
format that is easy to read, but I'm thinking of bowing to the inevitable
standard issue of XML (especially so email attachments could just be
innocent looking XML file). It really does not matter much what the
interchanged data looks like, and the local repository could be in a
different format. Each transaction has a UUID and it is OK to receive the
same transaction twice as long as it is identical.

There are some other details, but that is the basic picture.

Maybe people well-versed in OO design or distributed systems here might have
better ideas, or even point to existing systems which might be better
matches for either Squeak or the JVM than what I propose. If so, I'd
appreciate hearing about them. Here is one somewhat related idea released as
open source by NASA:
http://infolab.stanford.edu/~maluf/papers/xdb_ipg_ggf03.pdf
"This paper describes XDB-IPG, an open and extensible database architecture
that supports efficient and flexible integration of heterogeneous and
distributed information resources. XDB-IPG provides a novel “schema-less”
database approach using a document-centered object-relational XML database
mapping. This enables structured, unstructured, and semi-structured
information to be integrated without requiring document schemas or
translation tables. XDB-IPG utilizes existing international protocol
standards of the World Wide Web Consortium Architecture Domain and the
Internet Engineering Task Force, primarily HTTP, XML and WebDAV . Through a
combination of these international protocols, universal database record
identifiers, and physical address data types, XDB-IPG enables an unlimited
number of desktops and distributed information sources to be linked
seamlessly and efficiently into an information grid. XDB-IPG has been used
to create a powerful set of novel information management systems for a
variety of scientific and engineering applications."

Mine is mainly just simpler. :-)

I do waffle sometimes myself about going back to Squeak to do this (as I
waffle about spending time on getting a Squeak-like system on the JVM). I am
really impressed by Scratch, for example, as a stand-alone application. If
there was a motivated group of people interested here in such a system I
might be tempted to move back to the Squeak side for a time (maybe for
prototyping and then seeing how it goes after that), especially as the
Squeak license issues are getting cleaned up, although I'd be very rusty in
Squeak at this point so it is not my first choice at this point given my own
expertise in Jython. But the core I describe here is not very hard to build
in any language if you do the brute force approach (everything in memory
when you want to use a distributed database); what is the big time
requirement is writing the GUI applications on top of the core (stuff that
works like an email list, or like Wikipedia or Knol, or like SVN, or like
any of many other systems for stand-alone work or collaboration). Squeak has
the rudiments of many of those things, so there is a good argument one could
build on top of, say, Celeste as an email client, or use Scamper to build a
distributed wiki system. This sort of project might really be facilitated by
all the years of hard work people have put into Squeak applications, but
leveraging them all those somewhat copycat Pink Plane efforts in a radically
new Blue Plane distributed database sense. Even the original Augment code
that has been redone in Squeak could be integrated for fun. In any case, I'd
be happy to discuss these issues with Squeakers who wanted to build their
own system even if I stayed with Jython, with an eye to compatibility for
the files or other protocols used to interchange transactions or handle
other issues. Although the more I think about the possibilities of
leveraging all those previous Squeak efforts to build a self-contained
environment, the more interesting it sounds to do this in Squeak. Still,
there obviously are free and open source clients to do email and web
browsing etc. in a lot of languages, so one would expect that after the
system was defined and usable, that many other people would adopt the
distributed back end for their own systems (like write Thunderbird plugins
or whatever).

I know this all sounds ambitious, but the key idea is that workgroup
software is already being used, since all it takes to make this useful is a
few people who want to work together (like with Croquet), and an email
gateway lets the rest of the world that does not adopt the system still stay
in touch with the workgroup.

Anyway, I know I'm taking advantage of an unfortunate situation to toot my
own project's horn, sorry. And I probably would not have sent this email if
I had not seen that spam, as I'm mostly into Jython right now. Still,
basically, even regular spam is very annoying as I already pretty much will
never see email to me that is not one of:
* On a mailing list I have a filter for (but this spam got around that, and
suggests that approach won't work much longer for public lists),
* Has my name in it (even that is getting iffy since I signed up for some
Google groups that somehow spammers now connect my name and email),
* Has one of a few other common terms I am interested in (actually the spam
to Squeakdev triggered a filter I have or I might not have noticed it since
I don't follow this list closely these days), or
* Is on a whitelist of some senders I know and put on there manually (and
that may fall apart if the spammers improve in terms of their forgeries).
The rest of my recent email sits in a pile of over 40000 unread messages
(all spam, I hope, though sometimes I search on it and notice something that
slipped by). And that is 40000 spam messages even with SpamAssassin on the
email server filtering out the worst ones. And that's 40000 spams just since
last I cleaned that file out some months ago. Granted, my email address has
been on the web for more than ten years.

But in any case, the best reason to do this is actually not to get rid of
spam. It is to enable people to stigmergically refine knowledge-related
digital artifacts that people like Doug Engelbart, Vannevar Bush, Ted
Nelson, and Theodore Sturgeon envisioned decades ago.
http://en.wikipedia.org/wiki/Stigmergy

Still, I'd suggest this unfortunate incident might be more motivation for
Squeakers to do something about email in general to the extent technology
can help, and the above Social Semantic Desktop idea is one approach which
some Squeakers might be interested in. No doubt spammers will catch up
eventually, but in the meanwhile things may improve for a time, plus there
will be the new distributed applications. In the long term, social change to
a world of abundance for all may be a better solution, of course, at least
to reduce the motivation for most commercial spam; so in that sense the
abundance being facilitated by the internet will help the internet defend
itself from spammers in one way or another. :-)

--Paul Fernhout
http://www.pdfernhout.net/