Friday, August 29, 2003

Jena 2.0 is out

"Jena is a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, including a rule-based inference engine."

And missing from the FAQ is "Why Jena?" or "Why is Jena called Jena?".

Wednesday, August 27, 2003

Improving Searching

Putting it Together: Taxonomy, Classification & Search

"Another reason taxonomy, classification and search are being combined, says Feldman, is that not everybody knows exactly how to search. "Often what you want to do is browse a directory because you're not quite sure how to ask the question," she explains. "Taxonomy gives you a display of information that doesn't require you to put your need into words."

Knowledge experts now agree that, as Feldman puts it: "Taxonomy, classification and search need one another." Leading vendors including Autonomy, Convera, Inxight, Stratity and Verity are among those attempting to bring all the pieces together."

""The whole field of autoindexing and classification has made great strides," observes West's Dabney, "but the gold standard is still to have a person look at each document, decide what it's about and assign subjects. It's the volume of information that makes automatic classification necessary, but words have some unique meanings in a legal context, and we don't want to rely on a system that doesn't have a way of dealing with our intellectual domain."

West uses classification techniques that were developed internally, but Dabney says the company isn't doing anything mysterious: "Our system works. You have to give it the category initially, then give it a large training collection. This [works] because we have such a huge library of documents that have been classified manually.""

"Bray contends Google's approach won't work well in an enterprise environment. "The only answers are to generate metadata and to have a better user interface; the two are synergistic, one doesn't work so well without the other.""

Tuesday, August 26, 2003


"The Semantic Web technology is one key to assuring that peer-to-peer can resolve these issues. The Semantic Web technol-ogy adequately can describe resource information. RDF (Resource Information Framework) is attracting attention for its flexibility. One of the purposes of this study is to explore a ubiquitous peer-to-peer network architecture that will al-low various devices to communicate with one another across various networks."

There's some interesting screenshots of mobile phones running a search using their RDF enabled P2P technology.

A Platform for Peer-to-Peer Communications and its Relation to Semantic Web Applications

Part of the papers presented at the "1st Workshop on Semantics in Peer-to-Peer and Grid Computing"

Monday, August 25, 2003

IBM and Social Software

IBM Takes Search to New Heights "Officials at IBM's T.J. Watson Research Center here discussed with eWEEK this month how it is tackling the problem of understanding unstructured data. Using a combination of artificial intelligence techniques, IBM's UIMA (Unstructured Information Management Architecture) is the foundation for what Paul Horn, IBM senior vice president and director of research, calls "Google on steroids.""

Visualizing how wikis work "Visualizing how wikis work Martin Wattenberg and Fernanda Viegas, in IBM's Collaborative User Experience lab have created a tool called historyflow that lets you see the history of a wiki page. The site has many of their observations on observed patterns, and more of mine are below."

Some great diagrams of collaboration at work.

Sunday, August 24, 2003

Legal vs Illegal Monopolies

For a while now I thought it would be amusing if patents destroy the use of DRM, looking at this it seems that Microsoft may have a way out.

"Intertrust, a Digital Rights Management company owned by Philips and Sony, is suing Microsoft for patent infringement and seems to be making the case stick. Microsoft may soon be forced to pay another huge fine, remove the infringing DRM code from nearly all of its products, or license the DRM code from Intertrust at horrendous expense.

Those are the clues -- trouble for Microsoft in Europe, trouble for Microsoft in Digital Rights Management, and legal trouble for Microsoft in general."

Stream on

Collection of Semantic Web Articles

Rio "The very first seperate release of Sesame's RDF I/O package, aka "Rio". Rio contains parsers for RDF/XML and N-Triples as well as writers for RDF/XML, N-Triples and Notation3."

Should Atom Use RDF? "So should Atom be consumed as RDF? It depends. If you want to, and have the right tools, you can. You'll need to transform it into RDF first, but we'll provide a normative way to do that. If you don't want to, then you don't have to worry about it. Atom is XML."

Mark is in error. "Second, what makes Mark think that people who work with RDF/XML are familiar with XSLT or have this installed automatically with our 'tools'? I've worked with RDF/XML in six different programming languages, but I don't work with XSLT -- I think XSLT is the ugliest damn thing in the world to work with. I don't have XSLT support installed."

The Semantic Web is Closer Than You Think "The real achievement of OWL, then, at least as I see it, is to provide a solid foundation, both formally and implementationally, for the Semantic Web. It satisfies one of the necessary conditions of the possibility of there being a Semantic Web at all. And for that, all of us who appreciate the Web, as it is and as it could be, should be grateful."

What's Wrong with RDF/XML?

"Reported RDF/XML problems:
1. One cannot tell an RDF node element/property element by simple inspection of the element in question without knowing the ``striping'' (after Brickley[10]).
2. The frame-style approach does not clearly match the triples in the RDF graph.
3. There are excessive choices in choosing how to write RDF/XML.
4. Elements, attributes and attribute values are used for the same purposes, for example, encoding a URI.
5. The way that XML QNames are used does not constrain the elements and attributes that can appear in RDF/XML.
6. The unconstrained syntax cannot be described completely with XML schema languages such as DTDs and W3C XML Schema.
7. RDF/XML does not use W3C XML Schema datatypes.
8. The syntax is not easy to use with XML technologies such as XSLT, XQuery and other XML tools.
9. It is impossible to embed in XHTML while retaining DTD validation.
10. The syntax is incapable of encoding all legal RDF graphs.
11. In particular, certain graphs with blank nodes cannot be serialised.
12. It uses both namespaced and non-namespaced XML attributes for syntax terms.
13. It is hard to emit human-readable RDF/XML from an RDF graph due to the range of choices (after Carroll[11]).
14. Various aesthetic comments were levelled such as such as being ``ugly''."

Offers 3 alternative ways of producing RDF in XML format.

Thursday, August 21, 2003

Tipping Point

The Anti-Microsoft Tipping Point: Are We There Yet? "Had enough yet? Did the Blaster worm send you over the edge? Maybe one of its variants is emptying out your hard drive right now. Or perhaps your bottom line is still suffering from a shutdown caused by the SQL Slammer worm back in January."

It seems that some companies have already reached their tipping point: "Ball told his IT department he wanted Microsoft products out of his business within six months. "I said, 'I don't care if we have to buy 10,000 abacuses,'" recalled Ball, who recently addressed the LinuxWorld trade show. "We won't do business with someone who treats us poorly.""

This is such a good story about throwing away Microsoft products and replacing it with Linux. The attitude is very heartening especially in comparison with the previous article here. And all Microsoft can come up with to keep people coming is stuff like this and DRM.

Wednesday, August 20, 2003

Coverage of OWL

OWL flies as Web ontology language "OWL enables applications such as Web portal management, multimedia collections that cannot respond to English language-based search tools, Web services and ubiquitous computing, W3C said."

W3C Issues OWL as Candidate Recommendation "So what can OWL be used for? The OWL Working Group identified six main areas:

* Web portals, where it can be used to create categorization rules to enhance search
* Multimedia collections, where it can be used to enable content-based searches for non-text media
* Corporate Web site management, where it can be used for automated taxonomical organization of data and documents, as well as mapping between corporate sectors
* Design documentation, where it can be used for explication of 'derived' assemblies (like the wing span of an aircraft) and the explicit management of constraints
* Intelligent agents, where it can be used for expressing user preferences and/or interests, as well as content mapping between Web sites
* Web services and ubiquitous computing, where it can be used for Web service discovery and composition as well as rights management and access control."

W3C Releases Candidate Recommendations for Web Ontology Language (OWL) "The World Wide Web Consortium (W3C) has published a suite of six Candidate Recommendation specifications defining the Web Ontology Language (OWL)."

Douglas Englebart's Open HyperDocument System and Ontologies

In a followup to a review on "The Semantic Web" Eugene Eric Kim says: "I first got involved with Doug Engelbart's Open Hyperdocument System (OHS) project in April 2000. For the next six months, a small group of committed volunteers met weekly with Doug to spec out the project and develop strategy...I believe, more than ever, that developing shared ontology needs to be an explicit activity when collaborating in any domain. I'll discuss where or whether Semantic Web technologies fit in, in a later post."

OHS Launch Community: Experimenting with Ontologies

"To paraphrase Doug Engelbart, we're trying to make machines smarter. How about trying to make humans smarter?"

Monday, August 18, 2003

Betty's Brain

"Students teach science concepts to a multi-agent intelligent system, Betty's Brain, using concept map representations with a visual interface. The approach is based on cognitive science and education research -- and the old adage about 'you can't teach it if you don't understand it'."

Screenshot from JGraph showcase. The Effects of Feedback in Supporting Learning by Teaching in a
Teachable Agent Environment

Sunday, August 17, 2003

Interesting Links

DHTML Lemmings "Let's go!" (Doesn't seem to work properly under Safari).
SCO knew about redistributing their IP under GPL.
Perhaps why Microsoft is against the Semantic Web: "...if the Semantic Web does catch on, there will be tremendous pressure from consumers and businesses for open document standards."
Microsoft finally goes with the power.
Hyper-threading Java. Up to 75% better with Hyper-Threading. The IBM JVM much better than the Sun one (especially under Linux).

Friday, August 15, 2003

DSpace Continues Apace

"DSpace, the digital archive set up by the Massachusetts Institute of Technology, will have 5,000 items archived by this fall, and plans call for adding 7,500 theses later this year. MIT estimates that the free software has been downloaded 3,400 times and says it's aware of 100 research institutions that are evaluating DSpace with an eye toward archiving their own faculty's publications. The process of moving scholarly work directly into an institutional repository is changing the whole academic publishing model, because the process eliminates the publisher middleman completely in many cases. The new arrangement is preferable, say some scholars, because journals often delay publication of research findings and their audience is limited because they're so expensive."

TECHNOLOGY; In DSpace, Ideas Are Forever (from ShelfLife No. 119).

Thursday, August 14, 2003


SCAM is a content archive management system, developed under the supervision of the KMR group, in cooperation with the Swedish National Agency for Education (Skolverket) and Uppsala Learning Lab. It can be used as a web-based portfolio system or as an interoperable content archive.

SCAM is entirely implemented in Java using the J2EE architecture as its backbone and use RDF as the metadata representation format. Standards include for example Dublin Core and IEEE LOM for metadata, and IMS Content Packaging for structural information.

SHAME is a framework for developing RDF based metadata editors.

They are using Jena.

Open Source Google

"Nutch provides a transparent alternative to commercial web search engines. Only open source search results can be fully trusted to be without bias. (Or at least their bias is public.) All existing major search engines have proprietary ranking formulas, and will not explain why a given page ranks as it does. Additionally, some search engines determine which sites to index based on payments, rather than on the merits of the sites themselves. Nutch, on the other hand, has nothing to hide and no motive to bias its results or its crawler in any way other than to try to give each user the best results possible."

It probably just harvests the URLs, however, it seeds the crawler using DMOZ's RDF. According to the The Inquirer the Nutch people include: Mitch Kapor, Tim O'Reilly, Peter Savich, Raymie Stata and Doug Cutting.

This lead to a link posted by Danny Ayers about using Lucene as a triple store: "For example a triple (document) would be: -> title -> "A great Java developer's website"

This would be just one document in the index."

In TKS we present Lucene as a model and use our own store for the triples and just do joins across the two models.

2 Years to Go

"Willard Daggett told about 175 Lincoln and Prairie Grove teachers the World Wide Web, as they know it, will be phased out in two years to be replaced by a more sophisticated semantic web that will provide information based on meaning and concepts rather than individual words."

"In the future you won't surf with words and headers but with meanings and concepts," he added. "How to frame questions to get information is part of the new literacy."

Schools and teachers need to be responsive to the fast-changing world of technology, he said.

"It only took 20 years to get the overhead projector from the bowling alley to the classroom," he quipped."

Data, Loosely Coupled

Semantic integration: Loosely coupling the meaning of data " a standards-based, loosely coupled architecture, when the barriers to application integration are removed, instead of being helpful constructs, these various data structure representations actually get in the way. How information is stored and represented interferes with the meaning of that information."

"In order to follow this "Just-in-time" integration style, for service requesters to be able to consume data in an SOA, the data must be decoupled from any specific technical assumption (such as a specific data schema or format) so that they can be accessed via discoverable, loosely coupled, dynamically bindable services. Now this requirement doesn't mean that the data shouldn't have any structure at all, it just means that the service interface hides the details of that structure from the user, and the service interface itself is dynamically created based on the context of the service requester."

OS XML Database

"eXist is an Open Source native XML database featuring efficient, index-based XPath query processing, extensions for keyword search, XUpdate support and tight integration with existing XML development tools. The database is lightweight, completely written in Java and may be easily deployed in a number of ways, running either as a stand-alone server process, inside a servlet-engine or directly embedded into an application."

Supports 231 documents (with up to 263 nodes in a document), security, and intefaces via XML-RPC, HTTP, WebDAV and SOAP.

Wednesday, August 13, 2003

About that

JSR-666 - The 'that' keyword "The “this” keyword is very useful to identify members which belong to an object or class. Coming soon in Java 1.6 is the “that” keyword, which allows you to reference members in object that

* may not even exist yet (the forward temporal offset)
* did exist but were garbage collected (the reverse temporal offset)
* exist in the same application but in a different JVM on another server
(x-tier referencing)
* exist in the same JVM but in a different dimension (5th dimension array indexing)

A requirement of “that” is that a “this” reference must exist. How else could the extra time or dimensional versions exist without a symbiotic equivalent?""

Unfortunately, it sounds as if C# already has some of this functionality with delegates.

The Emperor's New Groove

It seems that RDF has finally got its groove back. "RDF can be readable?" is more interesting in what's been posted in the comments and the links rather than the actual posting (hmm metadata). It seems that Pie/Echo/Atom is unifying RSS 1.0 and RSS 2.0.

Jon Udell has a piece on RDF and symbol grounding he really likes Shelley's book and querying RDF: "This is cool. RDF triples are relations, and here we see that they're amenable to relational processing. I can grok that." I'm not sure that anyone is willing to say that RDF solves the "symbol grounding problem".

Lots of interesting stuff posted about graphs and trees from a recent conference. Including RDF Twig (based on Jena) which extends XSLT for RDF.

Wednesday, August 06, 2003

8 bit Semantics

Semantic web 2003: not unlike making music on a TRS-80 in the 1970's "I think this is important in the way that creating better guitar fingerboards or better synthesizer sound algorithms is important. In other words, musicians often like better musical instruments and make use of their increased nuances to make and/or inspire more and better music.

With the semantic web, I don't think we even have many musical instruments yet, at this human interface level. The focus is really on the mechanics, in this case, of data and information. It is like having a TRS-80 and being focused on getting it to produce two musical notes at once or having control over the volume or timbre of the notes."

"The web is already pretty exciting—some people are making creative "musics" with it, and lots of people are tuning in. But, I think the semantic web adds little to that dynamic right now. It isn't accessible enough to affect how "web music" is being made or how "web music" is being heard—the interfaces are not there yet."

I don't know about this argument. Semantic Web technology is as far away as "Bookmarks" under Mozilla. The interfaces already exist for metadata - Apple's iTunes, it doesn't need to change if one day it supports RDF rather than ID3 tags. The interface to the baconizer doesn't need to change if they move to RDF. If anything the Semantic Web will enable richer interfaces, ones that return better results, you're already soaking in Semantic Web interfaces.

Open Source for Education

Sharing the Code "While open-source-code projects like Linux have long been in the public eye, colleges and universities are now beginning to consider collaboration on similar efforts as a relatively cheap, effective way to meet their specialized software and computing needs."

I only mention this article because it lists DSpace - nice to see Semantic Web technology being considered usable enough for deployment.

Tuesday, August 05, 2003

RDF - A Space Odyssey

The Baconizer database contains 472,772 books, CDs, and videos spanned by 3,787,049 links. According to the FAQ these are taken from Amazon's links to related products. It then uses these links to find an association between two items. It's the sames as the 6 degrees of Kevin Bacon but with products instead of people.

These queries are made against a graph. What I'd really like to see is if an RDF triple store could beat the Oracle database at these kinds of queries.

Here's an example of "Practical RDF" to "2001 - A Space Odyssey".

Monday, August 04, 2003

Software History Repeating Itself

Platform and Community Observations from a Mainframe Software Pioneer "We so often trace our antecedents back simply to the Unix heritage, or the Lisp hacker heritage. But when I've talked to IBM old-timers, they make clear just how many of the social dynamics and collaborative software development paradigms of the early mainframe era resemble the open source tradition."

I've been reading From Airline Reservations to Sonic the Hedgehog : A History of the Software Industry which describes the early history of mainframe development. A group called SHARE whose only qualification was that they were the owners of the new 704 IBM mainframe. In it's first year of operation they claimed to have saved members around $1.5 million dollars (1950s dollars) and had shared around 300 programs.

The development of FORTRAN apparently created a "network effect". Users made an investment in FORTRAN because it was the first reliable language (increasing programmer productivity by 5 to 10 times). This was quite impressive as up to three quarters of the cost of running a computer (staff and machine time) was taken up writing programs - not running them. This led to other manufacturers providing FORTRAN for their machines and users sharing code amongst each other. Programs were considered free - they had no intrinsic value.

Sunday, August 03, 2003

Open Innovation

Are You Open To Innovation? "But the idea behind open innovation is that there are too many good ideas held by people who don't work for you to ignore. Even the best companies with the most extensive internal capabilities have to take external knowledge and ideas into account when they think about innovation. So good ideas can come from outside as well as inside. And they can go to market not only inside your company, but also outside, through others...Of the things that have to become a key focus, one is to be much more externally aware and externally focused before you undertake internal projects. You want to take internal projects in hand to fill a gap that's not being addressed outside or to put the pieces together in new combinations in systems or architectures that are very useful."

Interesting that Procter and Gamble will license technology to competitors and that IBM is the biggest reseller of Sun hardware. Unfortunately, he called Linux a computer language but he's still making valid points here.

Quick MBA's summary on Open Innovation "The research from PARC spawned many successful products, but the shareholders of Xerox did not benefit as much as others did. Employees who worked on promising technologies departed to form start-up companies, many of which, such as 3Com and Adobe, acheived much success. In fact, the market capitalization of Xerox's spin-offs exceeded that of Xerox itself."

Saturday, August 02, 2003

Data Federation

Can XML solve data federation conundrum? "Snapbridge Software, a start-up company based in San Diego, claims to have solved the problem of enterprise data federation with a combination of XML standardization and algorithms developed by its engineering team...The product he helped develop, Snapbridge FDX, is in limited release at “Fortune-class” companies in the financial, publishing, supply chain management and telecommunications industries, with an official launch set for this fall, he said."

Technology "The Semantic Objects that invoke the XML Processing within Snapbridge FDX provide interfaces that allow external entities--web browsers, applications, spreadsheets, databases, and web service--to not only view the results of data federation, but be able to update the data as a transaction and have those updates be applied to external systems."

Friday, August 01, 2003

More Machine Learning Algorithms

The people working on Weka Machine Learning Project have been busy. This includes an implementation of Learning Vector Quantization algorithms based on the author's understanding of the theory outlined in "Self Organising Maps" by T. Kohonen. A list of all the tools based on Weka is available here.

I've also come across an old series of articles about Instance Based Learning which includes Java source.