Ken wrote:
> I agree that for now we should simply move on but perhaps give a little > thought to what might be a workable alternative if this does turn into a > real problem in the future. The recent spam with forged senders to the Squeak list http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-January/133725.html suggests public email lists are going to be less and less useful. I know from one email forged to appear to be from me on one mailing list related to Doug Engelbart's Unrev-II colloquium (many years ago), how unsettling that can be for the one whose email address gets forged. Ultimately such things probably cannot be prevented entirely (security is never perfect), but such events can certainly be made less annoying and much rarer. One might ask how Squeak or similar systems could help with that. Filtering at one point will just never be perfect. One issue with email (unlike web forums) is that the community can't easily tag information *after* it is sent out in a machine-readable format, like as "forged" or even just "boring" or "interesting". One can even imagine tagging email information after it is sent with a complex set of information like Slashdot does, say, related to moderation by various human moderators, then interpreted locally in your email client using different moderator reputations you yourself or someone you trust supplies. Perhaps you might only read emails to the Squeak list that Tim Rowledge marked as VM-related. :-) Or, one can imagine better ways of splitting up long emails into a variety of topics after they were sent and creating better links between the ideas. So, if we have a way to tag information after it is sent out to a bunch of people, we can work collaboratively and stigmergically to build information into knowledge (or an approximation thereof). This is in some sense how the first email-like system by Doug Engelbart worked with Augment, although done on only one central machine. There are some commercial systems that do support shared distributed workspaces for communications, but we all know a dynamic and open source language platform should be able to support this better. :-) Wikis like Ward Cunningham invented in Smalltalk already work this way in a sense (continued modifications after the first post), but they are generally not distributed and wikis are generally restricted to a textual approach (although "semantic wikis" are emerging, like Semantic MediaWiki). Distributed systems like email are really good approaches for a lot of reasons like by minimizing bottlenecks if everyone used the same server for reading a web forum or by creating redundancy by everyone having a local copy of each message after it is sent. So, the basic technical idea of email is not that bad in terms of being distributed (likewise Usenet was a really good idea in some ways to move stuff around in a distributed way). But email is showing its age because email is still perceived mostly as being about sending free form text, so there is no obvious way to hook tagging into it. And of course HTML mail was a step in the wrong direction because it didn't address the general problem of sending complex information people could easily collaborate to enhance it, while at the same time HTML mail made plain text email more difficult to use. But it still showed one can build stuff on top of email, as does the notion of MIME types and attachments. We still could use something better, and many agree, but it seems like an impossible task to upgrade email and not worth the bother. Backwards compatibility is nice, so one can imagine a new system would ideally need to support conventional email, perhaps adding extra tagging information inline or in attachments. But that all still seems more complicated than it is worth. But what if there were other good reasons to switch to a new communications platform? I'd suggest the future may not be so much in reinventing mailing lists and email with authentication, but rather more in moving entirely to a new multi-purpose distributed paradigm like the "Social Semantic Desktop" (SSD) idea: http://www.semanticdesktop.org From there: "The Internet, electronic mail, and the Web have revolutionized the way we communicate and collaborate - their mass adoption is one of the major technological success stories of the 20th century. We all are now much more connected, and in turn face new resulting problems: information overload caused by insufficient support for information organization and collaboration. For example, sending a single file to a mailing list multiplies the cognitive processing effort of filtering and organizing this file times the number of recipients -- leading to more and more of peoples' time going into information filtering and information management activities. There is a need for smarter and more fine-grained computer support for personal and networked information that has to blend the boundaries between personal and group data, while simultaneously safeguarding privacy and establishing and deploying trust among collaborators." That page goes on to provide more details including: "P2P and Grid computing, especially in combination with the Semantic Web field, develops technology to interconnect large communities without centralized infrastructures for data and computation sharing, which is necessary to build heterogeneous, multi-organizational collaboration networks." You could imagine a Social Semantic Desktop from one point of view as being like a really smart email client, where you could send out emails to mailing lists saying how to tag previous emails. But, one can imagine supporting all sorts of backends besides email to let small workgroups communicate in various ways. And one can imagine having all sorts of applications running on top of this distributed infrastructure (my wife and I have a couple new ones in mind myself related to manufacturing and storytelling). Squeak does have distributed applications like Croquet, http://en.wikipedia.org/wiki/Croquet_project but I'm talking about a more common infrastructure for all these ideas. Perhaps the TeaTime TObject idea, while great for what it does with Croquet, is not the only approach to abstraction for the general problem of collaborative work on information? I have some notions on this myself which I have been pursuing in Jython for the JVM: http://sourceforge.net/projects/pointrel/ Sorry it's not in Squeak, although earlier versions of that code before the SSD focus from years back were for Squeak. Essentially, I've been working on a triple store called Pointrel on-and-off for about a quarter century. That work even predates WordNet which was in a tiny way perhaps inspired by it. Why isn't it done yet? Good question. :-) I'd suggest that using Squeak as the basis for a similar or better Social Semantic Desktop might be the day that Squeak conquers the world. :-) Or at least a bigger part of it than it has already with Seaside, Croquet, Scratch, etc.. :-) Maybe it could be done using TeaTime, but here is another possibility coming at this situation from a different perspective. Here is a rough idea of what I think is a good architecture for a Social Semantic Desktop and which I am working towards right now. Although I'm working in Jython to leverage Java, obviously anybody could work on in Squeak as friendly co-opetition, and so people are welcome to sign up for the Pointrel SSD mailing list on SourceForge and use it as a place to bounce around Squeak-related version ideas for now. That list: http://sourceforge.net/mailarchive/forum.php?forum_name=pointrel-discuss Obviously, the hope is to replace that list using the system itself. :-) So, it's just for bootstrapping. :-) Basic implementation ideas: * Internally, information is stored in the equivalent of RDF triples. http://en.wikipedia.org/wiki/Resource_Description_Framework I'm using a variant of a triple with a context field as well, like NEPOMUK does: http://nepomuk.semanticdesktop.org (NEPOMUK is a SSD attempt for KDE.) Triples are a general purpose way to store information, essentially just saying how digital objects link together. One might expect there would be ways to place Smalltalk objects into sets of triples (or even just strings) and get them back out again. Each of the four fields I'm using has both a data field an a namespace describing how to interpret the data. Using the RDF naming convention, there are "subject", "predicate", and "object" fields. (I actually prefer "object", "attribute", "value" which are more OO-like.) The fourth "context" field is sort-of equivalent to a file type and file name. These triples are defined in approach I am currently taking using "transactions" which are sets of triples to add or remove from the triple store all together. * Objects are essentially defined by these triples, and their complete history is implicit in the list of all transactions which is stored on disk like a Smalltalk changes file (although possibly transactions might be stored as lots of little files with UUID names until they are periodically joined together in a larger file). Storing the entire history of the shared database is wasteful in some ways, but disk space is cheap these days; likely the whole history will be in memory as well (and that's how I'm doing the Jython implementation). For any implementation, the RDF database is either entirely in memory (easy for the first try and coding it yourself) or on-disk and cached. There are multiple RDF triple-stores that do this already on disk with a memory cache in efficient ways, including with query languages etc., though not to my knowledge with these distributed transaction files like I mention below, though maybe one does? * Applications like email, shared to-do lists, virtual worlds, shared eToys, source code control systems, etc. would then all run on top of this distributed system. Not every client might have all transactions, in which case their data may have inconsistencies, but applications should be written to be forgiving of this. :-) Or, application that are less forgiving would require using more exacting backends than email lists (like a central database or a IRC-like system or TeaTime for real-time low-latency applications). Essentially, different distributed databases might have different preferred coordination backends depending on the need for reliability or low-latency or public accessibility. This system could potentially replace the Squeak changes file and sources file and various Squeak source control systems someday, like say if each change to the system was written out as a transaction in an XML file. (I've also implemented an OO-like system purely on triples that supported a crude Smalltalk-like environment as an experiment a few years back, but that's another issue; for now I just see this approach as a flexible shared database which transmits the data usually in objects, but I point that proof-of-concept out to show how general purpose triples can be.) * Information could be exchanged as "transactions" that are distributed in some way like via mailing list email attachments, source code control system like git, CGI-based system, remote database, WebDAV, TeaTime, or whatever backends one had access to for transmitting these transaction files (or the equivalent). I have not implemented this part yet, as the system I have so far just uses one big file and for shared use can redirect changes to that file through a CGI script which is somewhat like a mailing list archive that clients can consult to get all the recent changes they don't have yet. These transactions could have digital signatures to make forgeries more difficult (though I have not implemented that either). Each transaction is mainly a list of triple additions or deletions which modify the triple store (with a timestamp and author and licenses granted for each action -- in general the system now goes overboard with explicit license granting). Each transaction also has a timestamp and the file also specifies which distributed database it goes with (via a UUID). (It might make sense to have more than one transaction in a file perhaps.) Probably XML is an OK choice for these transaction files, which in a sense represent complex messages about how to change the state of a triple store. I currently use a different plain text format that is easy to read, but I'm thinking of bowing to the inevitable standard issue of XML (especially so email attachments could just be innocent looking XML file). It really does not matter much what the interchanged data looks like, and the local repository could be in a different format. Each transaction has a UUID and it is OK to receive the same transaction twice as long as it is identical. There are some other details, but that is the basic picture. Maybe people well-versed in OO design or distributed systems here might have better ideas, or even point to existing systems which might be better matches for either Squeak or the JVM than what I propose. If so, I'd appreciate hearing about them. Here is one somewhat related idea released as open source by NASA: http://infolab.stanford.edu/~maluf/papers/xdb_ipg_ggf03.pdf "This paper describes XDB-IPG, an open and extensible database architecture that supports efficient and flexible integration of heterogeneous and distributed information resources. XDB-IPG provides a novel “schema-less” database approach using a document-centered object-relational XML database mapping. This enables structured, unstructured, and semi-structured information to be integrated without requiring document schemas or translation tables. XDB-IPG utilizes existing international protocol standards of the World Wide Web Consortium Architecture Domain and the Internet Engineering Task Force, primarily HTTP, XML and WebDAV . Through a combination of these international protocols, universal database record identifiers, and physical address data types, XDB-IPG enables an unlimited number of desktops and distributed information sources to be linked seamlessly and efficiently into an information grid. XDB-IPG has been used to create a powerful set of novel information management systems for a variety of scientific and engineering applications." Mine is mainly just simpler. :-) I do waffle sometimes myself about going back to Squeak to do this (as I waffle about spending time on getting a Squeak-like system on the JVM). I am really impressed by Scratch, for example, as a stand-alone application. If there was a motivated group of people interested here in such a system I might be tempted to move back to the Squeak side for a time (maybe for prototyping and then seeing how it goes after that), especially as the Squeak license issues are getting cleaned up, although I'd be very rusty in Squeak at this point so it is not my first choice at this point given my own expertise in Jython. But the core I describe here is not very hard to build in any language if you do the brute force approach (everything in memory when you want to use a distributed database); what is the big time requirement is writing the GUI applications on top of the core (stuff that works like an email list, or like Wikipedia or Knol, or like SVN, or like any of many other systems for stand-alone work or collaboration). Squeak has the rudiments of many of those things, so there is a good argument one could build on top of, say, Celeste as an email client, or use Scamper to build a distributed wiki system. This sort of project might really be facilitated by all the years of hard work people have put into Squeak applications, but leveraging them all those somewhat copycat Pink Plane efforts in a radically new Blue Plane distributed database sense. Even the original Augment code that has been redone in Squeak could be integrated for fun. In any case, I'd be happy to discuss these issues with Squeakers who wanted to build their own system even if I stayed with Jython, with an eye to compatibility for the files or other protocols used to interchange transactions or handle other issues. Although the more I think about the possibilities of leveraging all those previous Squeak efforts to build a self-contained environment, the more interesting it sounds to do this in Squeak. Still, there obviously are free and open source clients to do email and web browsing etc. in a lot of languages, so one would expect that after the system was defined and usable, that many other people would adopt the distributed back end for their own systems (like write Thunderbird plugins or whatever). I know this all sounds ambitious, but the key idea is that workgroup software is already being used, since all it takes to make this useful is a few people who want to work together (like with Croquet), and an email gateway lets the rest of the world that does not adopt the system still stay in touch with the workgroup. Anyway, I know I'm taking advantage of an unfortunate situation to toot my own project's horn, sorry. And I probably would not have sent this email if I had not seen that spam, as I'm mostly into Jython right now. Still, basically, even regular spam is very annoying as I already pretty much will never see email to me that is not one of: * On a mailing list I have a filter for (but this spam got around that, and suggests that approach won't work much longer for public lists), * Has my name in it (even that is getting iffy since I signed up for some Google groups that somehow spammers now connect my name and email), * Has one of a few other common terms I am interested in (actually the spam to Squeakdev triggered a filter I have or I might not have noticed it since I don't follow this list closely these days), or * Is on a whitelist of some senders I know and put on there manually (and that may fall apart if the spammers improve in terms of their forgeries). The rest of my recent email sits in a pile of over 40000 unread messages (all spam, I hope, though sometimes I search on it and notice something that slipped by). And that is 40000 spam messages even with SpamAssassin on the email server filtering out the worst ones. And that's 40000 spams just since last I cleaned that file out some months ago. Granted, my email address has been on the web for more than ten years. But in any case, the best reason to do this is actually not to get rid of spam. It is to enable people to stigmergically refine knowledge-related digital artifacts that people like Doug Engelbart, Vannevar Bush, Ted Nelson, and Theodore Sturgeon envisioned decades ago. http://en.wikipedia.org/wiki/Stigmergy Still, I'd suggest this unfortunate incident might be more motivation for Squeakers to do something about email in general to the extent technology can help, and the above Social Semantic Desktop idea is one approach which some Squeakers might be interested in. No doubt spammers will catch up eventually, but in the meanwhile things may improve for a time, plus there will be the new distributed applications. In the long term, social change to a world of abundance for all may be a better solution, of course, at least to reduce the motivation for most commercial spam; so in that sense the abundance being facilitated by the internet will help the internet defend itself from spammers in one way or another. :-) --Paul Fernhout http://www.pdfernhout.net/ |
Free forum by Nabble | Edit this page |