Thursday, May 13, 2004

Pure Abstraction

RDF as data abstraction "Mozilla's central interface for dealing with data is RDF. RDF datasources can have any implementation they like behind them. However, they must provide the information through the nsIRDFDataSource interface that specifies a graph API of nodes and edges. Once this is done, any client can operate with the datasource without reference to the domain objects, methods, or underlying implementation. You just need to know how to manipulate a graph, and you're set.

This is the true potential of RDF. It's not a substitute for databases or XML. RDF is a directed labelled graph. It hides information and provides a data interface which can be used to aggregate multiple datasources that can be themselves RDF datasources. This is the purest data abstraction. Theoretically, if you weren't worried about performance you could access an RDF datasource and not know or care whether you're accessing a database, a web service, or both. The RDF datasource might even optimize its data structures based on what your queries or iteration patterns have been, in much the same way as hotspot compilation optimizes algorithms in the JVM.

A good question at this point is whether RDF is as powerful an API as JDBC. The answer is yes, or not quite. RDF is only a data structure, and there are several APIs for manipulating that data structure. The most well known implementation is Jena. Jena's API for manipulating RDF graphs is very powerful, and encompasses most things you would find in JDBC, such as transactions, a query language, and prepared statements."

The Blank Node Release

Kowari 1.0.3 is now available for download. Loading RDF/XML with blank nodes is now faster, inserting statements with blank nodes now works correctly, and blank nodes in Jena are now properly mapped. There are other things. It's annoying that RDF has named things (about) and unnamed things (blank nodes) and locally named things (blank nodes with nodeIDs). iTQL adds variables which are differently labelled blank nodes too.

The next release should have some inferencing and resolver work done to it which will be a major change to the architecture - unless we find some critical bug.

Paul has also updated the plans for a new triple store for Kowari/JRDF. I'm favouring the second approach but they both have positives and negatives. The ring structure has some advantages over the current structure (and the first approach) as it greatly reduces the amount of disk usage. I think bloom filters have been put on the back burner.

Wednesday, May 12, 2004

Rewerse

Reasoning on the Web with Rules and Semantics "The objective of REWERSE is to establish Europe as a leader in reasoning languages for the Web by

1. networking and structuring a scientific community that needs it, and by
2. providing tangible technological bases that do not exist today for an industrial software development of advanced Web systems and applications."

Tuesday, May 11, 2004

Bloom Filter Implementation

Based on the bloom filters post a while back it looks like Paul and DM will be doing one - hopefully with a JRDF interface.

I'll have to get JRDF up to scratch. Some recent comments from both Paul and others make me acutely aware that it's definitely not even close to being the last word in Java RDF APIs.

Danny Hillis on the Knowledge Web

"ARISTOTLE" (THE KNOWLEDGE WEB) ""I am interested in the step beyond that," he says, "where what is going on is not just a passive document, but an active computation, where people are using the Net to think of new things that they couldn't think of as individuals, where the Net thinks of new things that the individuals on the Net couldn't think of."

"In the long run, the Internet will arrive at a much richer infrastructure, in which ideas can potentially evolve outside of human minds. You can imagine something happening on the Internet along evolutionary lines, as in the simulations I run on my parallel computers. It already happens in trivial ways, with viruses, but that's just the beginning. I can imagine nontrivial forms of organization evolving on the Internet. Ideas could evolve on the Internet that are much too complicated to hold in any human mind.""

"In this regard, thanks to funding from the Markle Foundation, Danny been able to assemble a group of people to begin to discuss of the implementation of a medical application based on his ideas."

Medical application, hasn't this been done before? Only Jaron Lenier mentions the Semantic Web.

Euroweb

Europe spins the semantic web "A number of exciting new semantic web projects have been recently launched as part of Europe’s latest framework research programme. These projects are being showcased in the 1st European Semantic Web Symposium (ESWS 2004) on 10-12 May in Heraklion (http://www.esws2004.org/ )."

Securing the Semantic Web

Following a question on rdf-interest there's been two interesting links with regard of a vocabulary for describing user permissions:
* This document describes the ACL storage and query mechanisms used by W3C, as well as the availability and use of this data on the semantic web.
* Semantic Web Trust and Security Resource Guide and more specifically KAoS Policy and Domain Services:Toward a Description-Logic Approach to PolicyRepresentation, Deconfliction, and Enforcement.

Monday, May 10, 2004

kSpaces.net

kSpaces "...is a metadata-driven, distributed knowledge management platform. It was designed to be lightweight, transparent and extensible...kSpaces automatically tags files through the use of plugins. The two autotagging plugins that are included analyze a file's ID3 and EXIF headers, and then generate the appropriate RDF metadata.

Metadata associated with a file can be viewed and edited through the kSpaces Node application, supported by editor plugins. Five editor plugins have been included in the proof-of-concept, four of which are read only. These plugins allow the management of a subset of Dublin Core metadata, EXIF metadata, ID3 metadata and kSpaces-specific metadata. The Raw RDF plugin shows the raw RDF metadata associated with a knowledge asset."

Cool, an extensible metadata extrator.

Saturday, May 08, 2004

Broken Windows

Something that I've been pushing with Kowari recently.

Don't Live with Broken Windows "You don't want to let technical debt get out of hand. You want to stop the small problems before they grow into big problems. Mayor Guiliani used this approach very successfully in New York City. By being very tough on minor quality of life infractions like jaywalking, graffiti, pan handling—crimes you wouldn't think mattered—he cut the major crime rates of murder, burglary, and robbery by about half over four or five years.

In the realm of psychology, this actually works. If you do something to keep on top of the small problems, they don't grow and become big problems. They don't inflict collateral damage. Bad code can cause a tremendous amount of collateral damage unrelated to its own function. It will start hurting other things in the system, if you're not on top of it. So you don't want to allow broken windows on your project.

As soon as something is broken—whether it is a bug in the code, a problem with your process, a bad requirement, bad documentation—something you know is just wrong, you really have to stop and address it right then and there. Just fix it. And if you just can't fix it, put up police tape around it. Nail plywood over it. Make sure everybody knows it is broken, that they shouldn't trust it, shouldn't go near it. It is as important to show you are on top of the situation as it is to actually fix the problem. As soon as something is broken and not fixed, it starts spreading a malaise across the team. "Well, that's broken. Oh I just broke that. Oh well." "

Thursday, May 06, 2004

Eh

I'm really impressed by Paul's blow by blow report on our inferencing work (he also mentions Lawrence Lessig on Australian radio). We're only doing RDFS at the moment. We'll being doing at least owl tiny, we're still deciding. Without Sesame or Jena some of these architectural decisions would have been much harder. I think we've got the right mix of new ideas based upon work that's been done before.

Edd Dumbill won't be attending WWW2004 and that's a shame. Especially because I'm going to be there - hopefully to put faces to names. I think the developer's day looks good - "Doug Cutting, the leader of the Nutch open-source search engine project, and an interactive luncheon Q&A with Tim Berners-Lee." TKS will be there too. TKS is Kowari with other features such as security, related to queries, support and GUI management.

Sunday, May 02, 2004

Kowari linkers

Knowing what people are thinking is important in improving the product. With that in mind I did a quick Google for people mentioning Kowari. I'll use it here to correct anything too.

DeliverableS4Simile "Naive use of persistent store in Jena decreases performance by 100x:
* Part of the problem is limited expressivity in RDQL
* Look at performance tuning on databases
* Ryan: Look at the performance of more specialized RDF databases, cf. Kowari"

Open Source Projects That Use Java NIO "The storage engine of Kowari is a transactional triplestore known as the XA Triplestore. ll relevant fields of in-memory and on-disk data structures are 64 bits wide..."

Kowari for hundreds of millions of triples "Jim Hendler emailed me in response to my having mentioned on www-rdf-interest@w3c.org that I was surveying triple stores for use in data mining and machine learning. He mentioned a Java-based, non-relational, triple store called Kowari that is available in open source form..."

RDQLPlus "I discovered Kowari last week. The iTQL language is very similar to what I've come up with for RDQLPlus... having equally been inspired by SQL/DDL. Kowari looks like a nice database. I've downloaded it but haven't had a chance to play with it yet..."

Some Tools "Kowari is a layer on top of Jena with OWL reasoning, too"

I would say that Kowari is a layer beneath Jena - one that provides persistance. The OWL reasoning is really only offered at the moment through Jena. However, we will be getting some basic inferencing, at our own query layer, in there soon too.

[protege-discussion] Re: large data sets, bulk data acquisition "I had a really bad time with Kowari earlier this year, it wouldn't compile and then pass its own self-tests...."

This is basically problems with Windows. We develop on Linux and OS X and only do QA on Windows. Our initial release had known problems under Windows - which lead to failing unit tests but was not fatal for data storage. Anyway, it is fixed now; although Windows does have some drawbacks when it comes to using NIO.

Ontological Software Development

Semantic Mapping, Ontologies, and XML Standards ""Within an ontological framework, integration analysis naturally leads to generalization."

Considering that statement, it's also clear that application independence of ontological models makes these applications candidates for reference models. We do this by stripping the applications of the semantic divergences that were introduced to satisfy their requirements, thus creating a common application integration foundation for use as the basis for an application integration project."

"Once we define the ontologies, we must account for the semantic mismatches that occur during translations between the various terminologies. Therefore, we have the need for mapping.

Creating maps is significant work that leverages a great deal of reuse. The use of mapping requires the "ontology engineer" to modify and reuse mapping. Such mapping necessitates a mediator system that can interpret the mappings in order to translate between the different ontologies that exist in the problem domain. It is also logical to include a library of mapping and conversion functions, as there are many standards transformations employable from mapping to mapping."

Friday, April 30, 2004

Limiting Complexity

The Psychology of Ontology Harmonization "My personal experience seems to suggest the exact opposite: higher the abstraction, lower the objectivity with which people can argue and, therefore, come to an agreement.

I've spoken to people that spent several years of their lives coming up with an ontology and their perception is that the complexity over time of these models to cover a particular domain saturates, does not continue to grow.

This is a basic, but vital assumption for this entire approach to work: if the ontologies grow linearely with the amount of information they can describe, the ontology creation/maintainance process simply won't scale globally."

Intellidimension's SWS

Semantic Web Search "Semantic Web Search is a search engine for the Semantic Web. Our site can be used by both people and computers to precisely locate and gather information published on the Semantic Web."

From the FAQ:
"...using our standard search engine interface you can just type a one or more of keywords describing the information you are trying to locate. This is no more complicated than a traditional Web search engine. However like a traditional Web search engine this can lead to a large number of irrelevant results. To narrow your search you can restrict it to the specific type of resource that you are trying to locate such as a person (FOAF Person) or news article (RSS Item). If your search is still producing a large number of irrelevant results than you can refine it further by specifying one or more specific property values that the resource must have."

Browse the Semantic Web

How to Make a Semantic Web Browser "Two important architectural choices underlie the success of the Web: numerous, independently operated servers speak a common protocol, and a single type of client—the Web browser—provides point-and-click access to the content and services on these decentralized servers. However, because HTML marries content and presentation into a single representation, end users are often stuck with inappropriate choices made by the Web site designer of how to work with and view the content. RDF metadata on the Semantic Web does not have this limitation: users can access the underlying information and control how it is presented for themselves. This principle forms the basis for our Semantic Web browser—an end-user application that automatically locates metadata and assembles point-and-click interfaces from a combination of relevant information, ontological specifications, and presentation knowledge, all described in RDF and retrieved dynamically from the Semantic Web."

Some thoughts on RDF rendering had some ideas on visualizing the Semantic Web too.

The Passion of RDF

"SemanticBible is ...an emerging exploration of new applications of markup and computational linguistic technology to the study of Scripture, with an emphasis on practical tools that encourage understanding and personal transformation."

See also: The Vision of a Semantic New Testament: "Just as important as avoiding commercial barriers to sharing is the requirement that SemANT support existing and emerging standards that enable use across the Internet. To this end, SemANT will build on the Semantic Web Activity of the World Wide Web Consortium (W3C), including XML as a syntactic standard for data interchange, and RDF for ontology-based representation, and DAML/OWL for additional semantic expressiveness.".

Another example of when your technology has matured like: PCs, CDROMs, Hypertext, the Web, etc.

XUL vs XAML

Mozilla, Gnome mull united front against Longhorn "So far, XUL has failed to catch on, and Microsoft questioned whether Mozilla's technology would do much to help Gnome ward off Longhorn's promised threat.

XAML, Microsoft warned, is more potent than XUL in its ability to reflect exactly what's in the operating system.

"XUL is not the multipurpose declarative language that Gnome probably wants," said Ed Kaim, product manager for the Windows developer platform. "People say that when all you've got is a hammer, everything looks like a nail. In the same way, people are trying to figure out how to crush XUL into an OS it really wasn't designed for. The browser is great for a lot of things, but when it comes to robust client side applications, it's not the best."

Another trick will be in reconciling XUL with Gnome's existing user interface technology.

"There are ways to marry them," said Bruce Perens, an open-source consultant who serves as executive director of the Desktop Linux Consortium, a marketing organization. "But it's very difficult to get the two teams working in the same direction. They both went on a several-year tour of technical creation where they sat down and created everything they needed to do GUI [graphical user interface] applications — and they didn't create the same thing. Now to get them together it would take some number of years to resolve the technical diversions.""

Query Use Cases

Query Languages Report "A report by AIFB and Sesame and Jeen Broekstra from the Sesame crew. The Authors know what they are talking about as they are SemWeb developers themselves.

Although a little self advertisement and some missing languages, its a good thing to read. If you need info about RDF Query languages, read it.

My previous demand about "optional joins in queries" is answered by SeRQL."

The report is an excellent example of the current features required from a query language. From what I can tell iTQL implements 11 of the 14. Some of the others are fairly trival to add support for (like the data type support).

Wednesday, April 28, 2004

Will the Semantic Web Scale?

Apparently, there's going to be a debate at WWW2004 about whether the Semantic Web will scale: "However, with only a few exceptions we noticed that current research and development is focusing on creating new technologies for facilitating the Semantic Web. Available technologies from other disciplines such as databases are rarely reused and adapted. Hence, most Semantic Web systems do not scale to Web-size problems.

Lately, several researchers doubted whether the Semantic Web idea will ever scale for numerous reasons technological [But03], theoretical [van02] and practical [MS03,Sow]. Dedicated workshops on that topic [CKDE03,VDC03] have been organized recently to promote research to improve scalability. We will pick up these three categories of doubt by organizing the panel in three parts discussing each aspect: theory, technology/implementation, and practise."

I'm not sure I agree, there are very few Semantic Web systems that don't reuse existing SQL databases - they just suck at storing triples. With Kowari, and I'm sure with other native stores, the data structures and techniques used are taken directly from databases. They mention "Is the semantic web hype?" (which I responded to) and a few others. Although, there's no links to syllogism, metacrap or gnomes. BTW, I'm still not sure why you'd want an XML version of OWL.

"Network round-trips are often considerably less costly than the time taken for a transactional database operation due to the need to forcibly log transactional operations which is very costly in terms of disk performance. i.e. network round-trips aren't always the performance bottleneck." From Martin Fowler's First Law of Distribution.

As long as you keep the Semantic Web like the Web there's no real reason why it shouldn't scale.

Google Watching

What can't you find on Google? Vital statistics "Google is famous for being a confident, open company. Its clean, uncluttered search page is supposed to be a metaphor for the organisation behind it. But when you start asking questions about its technology, then the water rapidly becomes murky...One university presentation, for example, claimed that Google handled 150 million queries a day, and 1,000 per second at peak times...If the system is handling a peak load of 1,000 queries per second, he reasoned, that translates to a peak rate of 86.4 million queries per day - or perhaps 40 million queries per day...They also claim to have '4+ petabytes' of disk storage, and have let slip that each server is fitted with two 80 gigabyte hard drives. Now a petabyte is 10 to the power of 15 bytes, so if Google had only 10,000 servers, that would come to 400 Gb per server. So again the numbers don't add up."

Google Goes Public? The Rich Get Richer "People speculate. People dream. And if the numbers are to be believed, people will drool. The current prediction is that Google, if it decides to sell shares to investors this year, would probably end up with a market value of $20 billion to $25 billion by the end of its first day as a publicly traded company."

Google's Brin Talks on Gmail Future "It was interesting to me that you did finally hit on the word conversation. It seems to me that there's a synergy between the elements of the conversation in the RSS space and what you're doing in the e-mail space.

I think that's very true. Part of the things we've seen why blogs and RSS feeds are such a success is that you can actually read it—you don't have to stop, click back and forth, collect bits and pieces here and there—but it is all presented to you as one. "

Mozilla to Upgrade RDF

RDF module owner "With Benjamin Smedberg, Chase Tingley and Ben Goodger as peers, I took over the module ownership on RDF. We gonna push for both standards conformance (there are new specs out there since early 2004) and scriptability for remote web applications. This will include some serious whacking of the RDF API in Mozilla, as that is not ready for the web by a fair amount."

Tuesday, April 27, 2004

Ant is now more useful

ANT's finally a real build tool "And I can finaly call ANT a real build tool (and Maven can go play in its own cacca for all I care).

In a nutshell, the task lets you reference other build files. This means that you can create common centralized libraries of build files that other people can use on their own projects - all without copy and paste. And believe it or not, the semantics all make sense too. You can provide default tasks and properties, and the importer can override tasks and properties to customize behaviors on a case by case basis if it's required. The end result is that individual project build files are smaller and easier to understand, and common behavior can be achieved across an entire large system in a natural and non-cut-n-pasty manner (I don't know about you, but I always found pasties rather unnatural)."

Also new is macrodef: "Macrodef is a way to define a new Ant task in an Ant build itself. Macrodef allows you to define standard tasks that have attributes and elements given to them when they are called."

Paul's Blog

Paul is a guy that sits next to me at work and now is putting his development notes on a blog. So if you're interested in the inner-workings of Kowari take a look. I'll let people guess who DM, AM, and RMI are. :-)

Friday, April 23, 2004

Free Bits of Description Logic

DESCRIPTION LOGICS Includes a bunch of Postscript and PDF files (including the first couple of chapters of the DL Handbook).

Metaweb Graph Updated

New Version of My "Metaweb" Graph -- The Future of the Net "Many people have requested this graph and so I am posting my latest version of it. The Metaweb is the coming "intelligent Web" that is evolving from the convergence of the Web, Social Software and the Semantic Web. The Metaweb is starting to emerge as we shift from a Web focused on information to a Web focused on relationships between things --- what I call "The Relationship Web" or the "Relationship Revolution.""

Semaview Interview

What Is The Semantic Web? "The Semantic Web provides the foundation on which we can build more intelligent Internet applications. It will help everyone find, organize, collect, use and share information more easily."

"My company, Semaview has developed an application called eventSherpa. eventSherpa is making it simple to create and organize schedules and share them over the Internet. Our application automatically creates Semantic Web content transparently without the end user knowing it...Aside from reducing the complexity issue...I believe the largest challenge is convincing application developers to make their data available in semantic format. However it is "a chicken and egg problem" -- the more content available in a semantic format, the more applications that will be developed to take advantage of it; and vice versa."

Knobot

Reto was trying to get Kowari going. Hopefully it will get used for this project. Documentation gives download links and installation instructions.

From Danny Ayers: Knobot PlanetRDF Demo.

XML 2004

The State of XML "As a software developer I feel increasingly unhappy with the development of a monolithic mass of technology building up, only reasonably accessible behind a Java or .NET API. In contrast, the REST model of composed, simple interactions seems more controllable and containable and you can still see the angle brackets in order to check that things are working. There is still plenty of work and experimentation to be done yet with the notion of more document-oriented web services."

"Consequently, even at the low level of operating systems vendors are seeing the need and advantages of implementing metadata storage and manipulation.

This is good. We have the tools to support this, whichever way you swing on the technology issues. RDF & OWL, Topic Maps, W3C XML Schema: all have the right machinery. Unfortunately that's not the biggest issue. The main problem is which terms, schemas, and ontologies to use. That's just not clear right now for most if not all metadata applications. At best, we'll get inconsistently classified information, which defeats the promise of interoperability. More typically, we'll end up with little tagged metadata and islands of de facto proprietary information."

"As an RDF fan, the realization of this truth causes me some pain. The way out is to stop thinking of RDF as an XML application, and look to easier syntaxes such as Turtle and N3."

Wednesday, April 21, 2004

Unix Job Ad

Unix Specialist "Ah, Unix. Its cheapass cousin, Linux, is what all Microsoft users turn to just as their sanity reaches a crossroads. Did you know that Microsoft Word stills spellchecks ‘Unix’ as ‘UNIX’? Man, how 80’s does that look? I can imagine something like that flickering on the screen of a computer you assembled yourself from a crystal radio kit."

"But the point of all that is, Unix is basically a sort of secret society where you either know it, or you don’t. And since most people just really can’t be bothered going through the agonies of learning it, it’s why we have jobs like this: “Unix Specialist”. Of course that means nothing, or at least it means about as much as “Car Specialist” or “Bread Specialist”. Bread Specialist? What the hell is that? What kind of bread? White, multigrain, mixed grain, wholemeal, sourdough? Sliced or unsliced? If sliced, sliced for sandwiches or for toast? Crusty or soft? No matter! Just eat your bread!"

RDF Engine

RDF Engine "The program RDFEngine was developed as a part of the master thesis of Guido Naudts. It was build on the example of Euler, the program of Jos De Roo. The original version was made with Haskell. It was then rewritten in Python. Purpose of the program was to implement a logic program for the Semantic Web initiative. Concerning compatibility the program is meant to be compatible with CWM in the sense that sources that work with CWM will also work with RDFEngine but not vice versa.(I like to do some experiments of my own (-: ). For input and output Notation 3 is used. See also the Notation 3 tutorial."

Tuesday, April 20, 2004

80/20 REST to SOAP

XMLEurope, Monday "The keynotes were by Jeff Barr of Amazon and Steven Pemberton from W3C. Interesting to hear some detail about the Amazon Web Services, including the 80/20 split between developer use of REST and SOAP interfaces to the Amazon catalogue, and the business case for making the catalogue available in machine-processible format: essentially the value is not in their catalogue per se, and they have a business model for third party sellers and affiliates, so WS access just makes this relationship easier. The output example he gave looked to me very close to RDF; with Amazon's XSLT service it could be transformed to it very easily I think."

Pointless

Netscape Desktop Navigator The thing about a circular interface...it has no point...

Monday, April 19, 2004

Phew

Kowari 1.0.2 is now released - much better than the last one. Even available in a bite sized 14MB version. Unfortunately, we didn't get the anonymous node bug in our Jena implementation fixed in time. The next release should be within the next 2-4 weeks depending on the progress of the currently outstanding bugs.

BTW, you can now do things like: "select $s $p $o from <http://www.w3c.org/2000/08/w3c-synd/home.rss> where $s $p $o ;". You can combine local and remote (via file or http) models by using "and" and "or" in the FROM clause.

Bloom Filters in Social Networks

Building a Bloom Filter in Perl "One drawback of existing social network schemes is that they require participants to either divulge their list of contacts to a central server (Orkut, Friendster) or publish it to the public Internet (FOAF), in both cases sacrificing a great deal of privacy. By exchanging Bloom filters instead of explicit lists of contacts, users can participate in social networking experiments without having to admit to the world who their friends are. A Bloom filter encoding someone's contact information can be checked to see whether it contains a given name or email address, but it can't be coerced into revealing the full list of keys that were used to build it. It's even possible to turn the false-positive rate, which may not sound like a feature, into a powerful tool."

"If any one of the filters is intercepted, it will register the full 50% false-positive rate. So I am able to hedge my privacy risk across several interactions, and have some control over how accurately other people can see my network. My friends can be sure with a high degree of certainty whether someone is on my contact list, but someone who manages to snag just one or two of my filters will learn almost nothing about me."

"Additionally, you can combine two Bloom filters that have the same length and hash functions with the bitwise OR operator to create a composite filter."

Sunday, April 18, 2004

Save Our Software

Save Our Software "During the Internet boom, the traditional companies most obviously affected were bricks-and-mortar bookstores and travel agents...But in 2004, the businesses under direct financial assault by the broadband consumer Internet are not toothless independent bookstores. Instead, they are the major music labels, movie studios, and broadcasters."

"First, the FCC is conducting a proceeding to set the guidelines for what it calls 'software defined radio...in the near future, it will be possible use spectrum more efficiently and to increase competition in a space that's now the sole domain of incumbent operators. But only if we keep the FCC from regulating these technologies."

"Second...the FCC generally approved a mandate called the 'broadcast flag,' which would require that digital TV broadcasts include an anti-theft code to prevent consumers from recording programs off the air."

View My Stats