Sunday, May 30, 2004

The Semantic Web is your Friend, FOAF that is

The Semantic Web is Your Friend "...given the proliferation of data about data (metadata) that underpins the Semantic Web, it is tempting to focus in on the obvious prospect of a better-than-Google search engine...As clever and useful as this may be, it is only an evolutionary enhancement of something that is already possible on the Web today. By looking in a little more detail at what is happening on the Semantic Web today, it is possible to gain a deeper insight into where revolution is starting to occur."

"Applications like FOAF are at the vanguard of the Semantic Web, enabling a glimpse of what might be achieved."

I'm sick of FOAF being held up as the vanguard of the Semantic Web...

A Few New Semantic Web Tools and Browsers

Many of these were presented at the WWW2004 conference:

* Ganesha will be available soon for download.
* Simile and Longwell (a faceted approach using Jena and HTML).
* Haystack the next version - based on SWT and Eclipse.
* SWOOP 2.0 (which changed from using Jena to the Machester OWL API).

Saturday, May 29, 2004

Ontographics

"Ontographics is another visualization of graphs structures. It's a "...organization and presentation software. It is suitable for managing all kind of information and data: documents, projects, knowledge, and so on. Ontographics makes it easy to present, share and communicate informations to other people."

"It is a fully functional program, but is missing some features from Ontograpics Silver:

* no attachments (Word documents, images, etc) on nodes
* no installer
* no support"

Screenshot.

Goodbye New York

Paul Ford mentioned Kowari/TKS/Tucana Technologies in his writeup: WWW2004 Semantic Web Roundup.

"Many of the Semantic Web projects discussed during the conference used Jena as a backing store, but one contender to Jena's throne is Kowari, from Tucana Technologies."

"Rather than competing directly with Jena, Kowari includes support for the Jena API, as well as JRDF, an alternative RDF-management API, and adds to these APIs with a new SQL-like query language, iTQL, that can be used via a built-in interactive shell."

The TKS/Kowari presentation can be downloaded here. At the Semantic Web Developers Day Kowari got mentioned during the "Harpers.org: a Semantic Web Case Study", Simile and MINDSWAP presentations.

We should continue to make it clear that Kowari is not really going to take Jena's throne. In the short and medium term I can only see us improve Jena support. The slated FastPath and improvements in ontology speed are both things we're looking at putting into Kowari soon.

JRDF's goal is to be an API more approriate to use for our underlying triple store. With Kowari, we needed a better abstraction of using RDF triples and throwing away everything and doing Jena would've taken too long. The Jena support is basically a mapping of Jena to JRDF (and back again).

It was great to meet a lot of people face to face. And I got a lot of interesting news about various people using or evaluating Kowari. I think it'll lead to some interesting news in the near future.

Wednesday, May 19, 2004

Semantics in P2P and Grids Notes

Not having looked deeply into this in the past except at the The Semantic Grid web site I came into the workshop without much context as to what the talks were going to be about.

The first talk was about using query expansion on a peer to peer network. It extended Limewire to allow different nodes to expand the keywords using something called Keyword Relationship Databases (KRDB). Weightings are graded based on how successful the data they return is.

The second speakers presentation was about using Small World Networks (SWN) as a way to scale P2P networks. You place the data and the nodes in a k-dimensional space. The problem then is that you have to keep the number of dimensions reasonable. The speaker gave an example of flattening a 2-dimensional space into a 1-dimensional space by basically turning it into a linked list.

Bibster was the topic of the third paper. This used a combination of SeRQL, Sesame and JXTA to implement a distributed bibliographic system. It also used similarities in the metadata to route queries and remove duplicates.

The next talk focused on creating truely distributed P2P databases. It appeared that they were using something that looked liked lattices to implement schema matching between peers. This was called "correspondence rules". There was also "coordination rules" which dictacted how queries were forwarded to peers. It also used JXTA. The main feedback seemed to be that the paper gave names to these problems and but didn't really solve them.

coDB was the next topic of discussion. Which seemed to be a way of doing First Order Logic (FOL) over P2P networks. I had a problem understanding the point of this one. For example, the paper mentioned that if peers that created the result drop out then the answers from that peer should be removed. The feedback from someone else was whether it was useful in an open network to do this. The example brought up, was marriage. He was saying a peer would make the inference that if one person in a marriage was male the other would be female. And that brought up the rather topical example of gay marriages. What one peer inferred maybe different for different reasons than another peer. This was a bit beyond the scope of the paper though it seemed like a valid criticism.

The sixth talk was done by a someone from HP. This was pretty much the highlight of the morning. Because it specifically talked about RDF and OWL and they used Jena (of course). The main thing to come out of it is that they wanted to be able to calculate properties ("virtual properties") expressable in RDF/OWL. For example, server price is a funtion of software and hardware price. They created a type of TransitiveProperty called FunctionalProperty. A functional property would define a function like "foo = x * y + z". How this would be defined in OWL hasn't been defined as far as I know. It sounds similar to aggregate functions like average, sum and count but are fully configurable. I vaguely remember someone at work talking about this. Obviously, a cool idea anyway.

The last three talks were about trust, GridDB and ebXML. They were fairly short but interesting. Probably the most interesting thing I got out of it was that OASIS (who are working on ebXML) are trying to be inclusive of current technologies like Topic Maps and RDF/OWL. In fact, the speaker did say that UML was going to be expressable in RDF/OWL. I also thought that GridDB looked a lot like our current TMex product.

Anyway, apart from that I'm really enjoying the conference. Google set up a booth to try and get people to work for them - little did they know I was only after their pen.

Thursday, May 13, 2004

Pure Abstraction

RDF as data abstraction "Mozilla's central interface for dealing with data is RDF. RDF datasources can have any implementation they like behind them. However, they must provide the information through the nsIRDFDataSource interface that specifies a graph API of nodes and edges. Once this is done, any client can operate with the datasource without reference to the domain objects, methods, or underlying implementation. You just need to know how to manipulate a graph, and you're set.

This is the true potential of RDF. It's not a substitute for databases or XML. RDF is a directed labelled graph. It hides information and provides a data interface which can be used to aggregate multiple datasources that can be themselves RDF datasources. This is the purest data abstraction. Theoretically, if you weren't worried about performance you could access an RDF datasource and not know or care whether you're accessing a database, a web service, or both. The RDF datasource might even optimize its data structures based on what your queries or iteration patterns have been, in much the same way as hotspot compilation optimizes algorithms in the JVM.

A good question at this point is whether RDF is as powerful an API as JDBC. The answer is yes, or not quite. RDF is only a data structure, and there are several APIs for manipulating that data structure. The most well known implementation is Jena. Jena's API for manipulating RDF graphs is very powerful, and encompasses most things you would find in JDBC, such as transactions, a query language, and prepared statements."

The Blank Node Release

Kowari 1.0.3 is now available for download. Loading RDF/XML with blank nodes is now faster, inserting statements with blank nodes now works correctly, and blank nodes in Jena are now properly mapped. There are other things. It's annoying that RDF has named things (about) and unnamed things (blank nodes) and locally named things (blank nodes with node IDs). iTQL adds variables which are differently labelled blank nodes too.

The next release should have some inferencing and resolver work done to it which will be a major change to the architecture - unless we find some critical bug.

Paul has also updated the plans for a new triple store for Kowari/JRDF. I'm favouring the second approach but they both have positives and negatives. The ring structure has some advantages over the current structure (and the first approach) as it greatly reduces the amount of disk usage. I think bloom filters have been put on the back burner.

Wednesday, May 12, 2004

Rewerse

Reasoning on the Web with Rules and Semantics "The objective of REWERSE is to establish Europe as a leader in reasoning languages for the Web by

1. networking and structuring a scientific community that needs it, and by
2. providing tangible technological bases that do not exist today for an industrial software development of advanced Web systems and applications."

Tuesday, May 11, 2004

Bloom Filter Implementation

Based on the bloom filters post a while back it looks like Paul and DM will be doing one - hopefully with a JRDF interface.

I'll have to get JRDF up to scratch. Some recent comments from both Paul and others make me acutely aware that it's definitely not even close to being the last word in Java RDF APIs.

Danny Hillis on the Knowledge Web

"ARISTOTLE" (THE KNOWLEDGE WEB) ""I am interested in the step beyond that," he says, "where what is going on is not just a passive document, but an active computation, where people are using the Net to think of new things that they couldn't think of as individuals, where the Net thinks of new things that the individuals on the Net couldn't think of."

"In the long run, the Internet will arrive at a much richer infrastructure, in which ideas can potentially evolve outside of human minds. You can imagine something happening on the Internet along evolutionary lines, as in the simulations I run on my parallel computers. It already happens in trivial ways, with viruses, but that's just the beginning. I can imagine nontrivial forms of organization evolving on the Internet. Ideas could evolve on the Internet that are much too complicated to hold in any human mind.""

"In this regard, thanks to funding from the Markle Foundation, Danny been able to assemble a group of people to begin to discuss of the implementation of a medical application based on his ideas."

Medical application, hasn't this been done before? Only Jaron Lenier mentions the Semantic Web.

Euroweb

Europe spins the semantic web "A number of exciting new semantic web projects have been recently launched as part of Europe’s latest framework research programme. These projects are being showcased in the 1st European Semantic Web Symposium (ESWS 2004) on 10-12 May in Heraklion (http://www.esws2004.org/ )."

Securing the Semantic Web

Following a question on rdf-interest there's been two interesting links with regard of a vocabulary for describing user permissions:
* This document describes the ACL storage and query mechanisms used by W3C, as well as the availability and use of this data on the semantic web.
* Semantic Web Trust and Security Resource Guide and more specifically KAoS Policy and Domain Services:Toward a Description-Logic Approach to PolicyRepresentation, Deconfliction, and Enforcement.

Monday, May 10, 2004

kSpaces.net

kSpaces "...is a metadata-driven, distributed knowledge management platform. It was designed to be lightweight, transparent and extensible...kSpaces automatically tags files through the use of plugins. The two autotagging plugins that are included analyze a file's ID3 and EXIF headers, and then generate the appropriate RDF metadata.

Metadata associated with a file can be viewed and edited through the kSpaces Node application, supported by editor plugins. Five editor plugins have been included in the proof-of-concept, four of which are read only. These plugins allow the management of a subset of Dublin Core metadata, EXIF metadata, ID3 metadata and kSpaces-specific metadata. The Raw RDF plugin shows the raw RDF metadata associated with a knowledge asset."

Cool, an extensible metadata extrator.

Saturday, May 08, 2004

Broken Windows

Something that I've been pushing with Kowari recently.

Don't Live with Broken Windows "You don't want to let technical debt get out of hand. You want to stop the small problems before they grow into big problems. Mayor Guiliani used this approach very successfully in New York City. By being very tough on minor quality of life infractions like jaywalking, graffiti, pan handling—crimes you wouldn't think mattered—he cut the major crime rates of murder, burglary, and robbery by about half over four or five years.

In the realm of psychology, this actually works. If you do something to keep on top of the small problems, they don't grow and become big problems. They don't inflict collateral damage. Bad code can cause a tremendous amount of collateral damage unrelated to its own function. It will start hurting other things in the system, if you're not on top of it. So you don't want to allow broken windows on your project.

As soon as something is broken—whether it is a bug in the code, a problem with your process, a bad requirement, bad documentation—something you know is just wrong, you really have to stop and address it right then and there. Just fix it. And if you just can't fix it, put up police tape around it. Nail plywood over it. Make sure everybody knows it is broken, that they shouldn't trust it, shouldn't go near it. It is as important to show you are on top of the situation as it is to actually fix the problem. As soon as something is broken and not fixed, it starts spreading a malaise across the team. "Well, that's broken. Oh I just broke that. Oh well." "

Thursday, May 06, 2004

Eh

I'm really impressed by Paul's blow by blow report on our inferencing work (he also mentions Lawrence Lessig on Australian radio). We're only doing RDFS at the moment. We'll being doing at least owl tiny, we're still deciding. Without Sesame or Jena some of these architectural decisions would have been much harder. I think we've got the right mix of new ideas based upon work that's been done before.

Edd Dumbill won't be attending WWW2004 and that's a shame. Especially because I'm going to be there - hopefully to put faces to names. I think the developer's day looks good - "Doug Cutting, the leader of the Nutch open-source search engine project, and an interactive luncheon Q&A with Tim Berners-Lee." TKS will be there too. TKS is Kowari with other features such as security, related to queries, support and GUI management.

Sunday, May 02, 2004

Kowari linkers

Knowing what people are thinking is important in improving the product. With that in mind I did a quick Google for people mentioning Kowari. I'll use it here to correct anything too.

DeliverableS4Simile "Naive use of persistent store in Jena decreases performance by 100x:
* Part of the problem is limited expressivity in RDQL
* Look at performance tuning on databases
* Ryan: Look at the performance of more specialized RDF databases, cf. Kowari"

Open Source Projects That Use Java NIO "The storage engine of Kowari is a transactional triplestore known as the XA Triplestore. ll relevant fields of in-memory and on-disk data structures are 64 bits wide..."

Kowari for hundreds of millions of triples "Jim Hendler emailed me in response to my having mentioned on www-rdf-interest@w3c.org that I was surveying triple stores for use in data mining and machine learning. He mentioned a Java-based, non-relational, triple store called Kowari that is available in open source form..."

RDQLPlus "I discovered Kowari last week. The iTQL language is very similar to what I've come up with for RDQLPlus... having equally been inspired by SQL/DDL. Kowari looks like a nice database. I've downloaded it but haven't had a chance to play with it yet..."

Some Tools "Kowari is a layer on top of Jena with OWL reasoning, too"

I would say that Kowari is a layer beneath Jena - one that provides persistance. The OWL reasoning is really only offered at the moment through Jena. However, we will be getting some basic inferencing, at our own query layer, in there soon too.

[protege-discussion] Re: large data sets, bulk data acquisition "I had a really bad time with Kowari earlier this year, it wouldn't compile and then pass its own self-tests...."

This is basically problems with Windows. We develop on Linux and OS X and only do QA on Windows. Our initial release had known problems under Windows - which lead to failing unit tests but was not fatal for data storage. Anyway, it is fixed now; although Windows does have some drawbacks when it comes to using NIO.

Ontological Software Development

Semantic Mapping, Ontologies, and XML Standards ""Within an ontological framework, integration analysis naturally leads to generalization."

Considering that statement, it's also clear that application independence of ontological models makes these applications candidates for reference models. We do this by stripping the applications of the semantic divergences that were introduced to satisfy their requirements, thus creating a common application integration foundation for use as the basis for an application integration project."

"Once we define the ontologies, we must account for the semantic mismatches that occur during translations between the various terminologies. Therefore, we have the need for mapping.

Creating maps is significant work that leverages a great deal of reuse. The use of mapping requires the "ontology engineer" to modify and reuse mapping. Such mapping necessitates a mediator system that can interpret the mappings in order to translate between the different ontologies that exist in the problem domain. It is also logical to include a library of mapping and conversion functions, as there are many standards transformations employable from mapping to mapping."