More News: 08/01/2004

Monday, August 30, 2004

A consensual hallucination

Semantische Virtuelle Umgebungen RDF used to describe 3D environments. Presentation slides and paper (in English).

"There are a number of open issues regarding RDF modeling, particularly in the areas of inter-environment linking, coordinate systems, and event design. Other issues concern consistency and security in the presence of multiple channels. These are more grave as we cannot change the underlying protocols, and have to work solely with information already available."

Using Kowari's QueryHandler

I've recently been putting builds of Kowari 1.0.5 onto SF's CVS server. This has lead to some good feedback; after some initial bug fixing. It's focused what to work on and the practical up shot is that you can now do RDQL queries (found in org.kowari.store.jena):

RdqlQuery q = new RdqlQuery("select ?x ?y ?z WHERE (?x ?y ?z)");
q.setSource(model);
QueryExecution qe = new KowariQueryEngine(q);
QueryResults results = qe.exec();

This converts the Jena-based query object to Kowari objects, which are then processed by Kowari's query layer and returned as Jena objects.

The only query that it currently doesn't handle, although it soon will, is repeated variables in the constraint like:

select ?x
where (?x ?x ?x)

There's also recent work on client/server JRDF and Jena, N3 input and output, file upload/download from an iTQL client to/from the server, NOT (unstated) constraint, and other improvements.

Sunday, August 29, 2004

Blogging on RDF

Some recent blogs on RDF:
* Template Languages "There's been some discussion lately about how to improve XUL templates...Some proposals include using an SQL-like language which generates results. This is suitable for database-backed data, and a query language for RDF is being developed."
* HyperRDF: Using XHTML Authoring Tools with XSLT to produce RDF Schemas "XML syntax is a little tedious, but lots of people are evidently willing and able of editing it by hand. RDF adds another layer of tedium, but there are still a few folks willing to write it by hand. I make heavy use of reification/quoting in my representation of logical formulas in RDF. This adds another layer of tedium that I find unmanageable..."
* GUMP + RDF "Ok, so I used RDFLIB to allow Gump to generate some RDF"

Databases for free

The Economics of Software "...doesn't it strike you as odd that your operating system is essentially free, but your database is still costing you forty grand per CPU? Is a database infinitely more difficult to write than an operating system? (Answer: no.) If not, why the enormous pricing discrepancy?"

"The tendency towards open source is especially strong when companies profit not directly from the right-to-use of the software, but rather from some complementary good: support, services, other software, or even hardware...To put this in retail terms, open source software has all of the properties of a loss-leader -- minus the loss, of course."

"Will either of these [MaxDB and Ingres] be able to start taking serious business away from Oracle and IBM?...The economics of software tells us that, in the long run, this is likely the case: either the demand-side will ultimately force sufficient improvements to the existing open source databases, or the supply-side will force the open sourcing of one of the viable competitors."

The Pillars of Longhorn Crumbling?

There used to be three pillars. With the news of WinFS being dropped there's now two.

The WinFS datamodel had a lot going for it, in as much as its similarity to RDF. Cairo's promise still seems a way off. Metadata in the file system continues its tortious path.

Friday, August 27, 2004

Infoculturists

Collaborative knowledge gardening "Conventional wisdom holds that people will never assign metadata tags to content. It just isn’t on the path of least resistance, the story goes, and those few who do step off the path succeed only in creating unwieldy taxonomies. (Do you file the revised XML Schema specification under xml/specifications or specifications/xml? We can never agree, and many good minds are sacrificed in the vain attempt.) Yet somehow, users of Flickr and del.icio.us do routinely tag content, and those tags open new dimensions of navigation and search. It’s worth pondering how and why this works.

Abandoning taxonomy is the first ingredient of success. These systems just use bags of keywords that draw from — and extend — a flat namespace. In other words, you tag an item with a list of existing and/or new keywords. Of course, that idea’s been around for decades, so what’s special about Flickr and del.icio.us? Sometimes a difference in degree becomes a difference in kind. The degree to which these systems bind the assignment of tags to their use — in a tight feedback loop — is that kind of difference."

"The success of Flickr and del.icio.us won’t necessarily translate to the intranet. You can import the global-hive mind, but you can’t export the local-hive mind. That asymmetry defines the challenge we face as enterprise knowledge gardeners."

Semantic Web Tutorials

Tom has asked me to link to the SWBP&D WG Semantic Web Tutorials "This page provides a central collection of Semantic Web tutorial resources for interested readers and is maintained by the Semantic Web Best Practices and Deployment Working Group."

Specifically, he's after more good tutorials.

Ringfencing

I've linked to The Missing Webs before, here's another good quote:

"A big decision is whether the world model should be open or closed. If closed, we're in effect saying our system knows all it needs to know. If open, what's not specified is unknown. The real world is full of unknowns, but this does not sit altogether comfortably alongside the closed world of traditional relational DBs or the negation by failure of Prolog. A key aspect of the Semantic Web vision is the open world assumption...This doesn't throw out the possibility of using closed-world reasoning on Semantic Web data, as a locally closed world can be defined if required, by harvesting or ringfencing the statements of interest. But conversely filling a relational database with nulls will hamper its capabilities, and query responses can be a more accurate "don't know” rather than a rigid "false”."

Typed Literals and Language

RDF literals: "Plain literals have a lexical form and optionally a language tag as defined by [RFC-3066], normalized to lowercase.

Typed literals have a lexical form and a datatype URI being an RDF URI reference."

Why isn't xml:lang information represented within the RDF data model? "The WG subsequently resolved that typed literals would not have a language tag."

Interpretation properties "...they express the relationship between one value and that value interpreted (or processed in the imagination) in a specific way."

Fwd: Typed literals in RDF: "On Friday RDF Core changed the definition of a literal, so that a literal can be one of the following:

A lexical form
A lexical form paired with a language tag
A lexical form paired with a datatype URI"

This is old news (over a year).

Thursday, August 26, 2004

Findability on the Cheap

The Low-Cost Cure For 'Bad Search' "The answer, we're told, is either a massive taxonomy and classification initiative, a six-figure investment in a new search engine or the next release of the Windows operating system."

"Good foldering and file-naming practices focus on subject or function (rather than owner/author, creation/edit dates or other values available as metadata). Names should be consistent and concise (with eight or fewer characters) for easy navigation. Avoid repeating information available at a higher or lower level of the hierarchy (you don't need to use "contract" in the file name, for example, if it's in a "Contracts" folder)."

"Better search engines, classification tools and taxonomies have their place — particularly in large organizations that have been through mergers and reorganizations. But don't let big investments become a Band-Aid for poor training and a lack of best practices. With $31 billion in productivity at stake, an ounce of prevention is worth a lot of money. E-mail your best tips, practices and horror stories, and we'll publish the standouts in an upcoming issue."

Urchin-Kowari project

ADTF: Urchin-Kowari "The Urchin-Kowari project replaces the mySQL database with the specialized RDF triple store Kowari. This takes advantage of the fact that all data in Urchin are RDF statements and Kowari is a specialized RDF database. Urchin and Kowari communicate via SOAP, and can be running on different machines. This decoupling of interface and storage allows users the freedom to access any available Kowari server on the Web. In Urchin-Kowari, the core Urchin functionalities are preserved as much as possible. The Web interface retains its look, but query syntax has been modified. Raw iTQL queries to the storage replace the RCQL queries. Many Urchin keywords are being developed. Administrative control of the channels is also preserved."

I don't want to get into one of these GNU/Linux type debates but surely it should be Kowari-Urchin. :-)

Tuesday, August 24, 2004

RDF Defined

Eric Jain defines RDF using Wikipedia's RDF definition: "Those close [...] become passionately committed to possibly insane projects, without regard to the practicality of the implementation or competitive forces in the marketplace."

Also, the Wikipedia definition of Resource Description Framework:
"A collection of RDF statements represents a labeled oriented pseudograph. As such, an RDF-based data model is more naturally suited to certain kinds of knowledge representation than the relational model and other ontological models traditionally used in computing today, although RDF data is often stored in relational databases, and as OWL demonstrates, ontologies can be built upon RDF."

And a pseudograph has: "...both multiple edges and loops. When stated without any qualification, a graph is almost always assumed to be simple."

If you really read it just for the articles

While searching the SEC for smut I found that the SEC has the Google Playboy interview. This has other interesting details like Google's position in the market, details on AdSense, etc. And the mission statement: "...our fundamental goal is to connect people to relevant information and conversely to connect that relevant information to the people who need it."

Java + SW = Jena

How Big Is Your Store? "The repository will end up holding metadata about more than 16 million articles (plus their associated authors, affiliations, publications, etc) and as you'd imagine thats going to end up exploding into a large number of triples."

"I'd be interested to hear about how big a store people have worked with, including which APIs, etc they've been using.

To give a bit more context, as we're mainly a Java shop I've begun by considering any store that can be plugged into Jena. So the Jena persistence model support would be my baseline, with Kowari being another candidate. I see that RDFStore may be adding Jena support so we may explore that too."

16 million triples shouldn't be a problem for many stores.

I find that Jena does impose some limitations on scalability in that it tends to want to use in memory Models for a lot of things. It also tends to over use iterators. It's also a terribly complicated thing for something, RDF, that is quite simple.

I've been recently thinking that JRDF, Sesame's RIO, Kowari store and SOFA for OWL inferencing would make a small, scalable solution (around 5MB). Is anyone interested in that though?

Monday, August 23, 2004

Java vs Perl

Java Regex Wrangling "So then I ran it again without the output, just counting the tokens, and yowie zowie, Perl was at 8 minutes 47 seconds, Java back at 3 minutes 4 seconds. So I re-ran on a nearby Debian box, on the theory that the OS X versions of Java and Perl might not be representative of their kind. There are all sorts of variations around I/O and so on, but my finding is that for this problem, the Java 1.4.2 regex processing is somewhere around twice as fast as Perl 5.8.1. Frankly, I’m astounded."

I wrote up a JavaOne presentation on this in 2001. While I couldn't go and it wasn't presented this is pretty much what I found too.

The best of REST

Implementing REST Web Services: Best Practices and Guidelines " The architecture represented above has a pipe-and-filter style, a classical and robust architectural style used as early as in 1944 by the famous physicist, Richard Feynman, to build the first atomic bomb in his computing team. A request is processed by a chain of filters and each filter is responsible for a well-defined unit of work. Those filters are further classified as two distinct groups: front-end and back-end. Front-end filters are responsible to handle common Web service tasks and they must be light weight. Before or at the end of front-end filters, a response is returned to the invoking client...Most notably, the filters can be considered as a standard form of computing and new filters can be added or extended from existing ones easily. This architecture has good user-perceived performance because responses are returned as soon as possible once a request becomes fully processed by lightweight filters. This architecture also has good security and stability because security breakage and errors can only propagate a limited number of filters."

Web Services: REST in Peace? "The advantages of conceptualising an application in terms of services and messages, rather than in terms of APIs (either RESTful or RPC oriented varieties) are clear: it enables super-loose coupling both for composing services into applications and in the implementation of a service itself."

Ontaria

"Ontaria is a searchable and browsable directory of semantic web data. Our focus is RDF vocabularies with OWL ontologies, but all the RDF data we index is visible. The site is primarily intended for people creating RDF content who want to better understand which vocabularies are available and how they are being used. Beyond this, Ontaria may be useful for finding and exploring arbitrary RDF content."

Current Database: Last Updated: 29 April, Triples: 18172, Sources: 70.

Friday, August 20, 2004

Radio RDF

RDF Radio "So what's RDF Radio? Mostly vapour right now, but it's a concept Matt Croydon and I have been tossing around for a few weeks. The basic ideas of which are a little similar to Nokia's much touted but equally vapourous Visual Radio.

The main tenet of both is to provide some additional metadata about an existing "Radio" audio stream on a side channel. Visual Radio appears to be quite tightly constrained around proving visual (wap/html based?) user oriented information over GPRS as an adjunct to FM radio. RDF Radio on the other hand is intended to provide timely pure RDF/XML metadata that supplements any broadcast stream; FM, AM, webcast and much more."

Resting on SOFA

I've recently been part of taking a closer look at SOFA (I haven't been doing most of the work though). Initially, it looks too good to be true, OWL in 5 classes. Surely it can't be that easy? It's basically an object model for OWL and RDFS. It includes: Concept, Ontology, Relation, Restriction and Thing. A Concept represents a classification item (class) and a Thing represents a knowledge item (instance). Relations are transitive, symmetric or an inversion. Restrictions are by cardinality or value. It can read and write both OWL and RDFS.

Other features:
* Provides inferencing.
* Support for Java types. This includes mapping to 10 Java datatypes, like String to xsd:string. Other Java objects are base64 encoded - which is a strange feature but kind of cool.
* Events and event listeners (when Things are added, removed or modified).
* Uses URI objects.
* Supports checked exceptions and throws them when you do something wrong.
* Interfaces all the way through.
* DOTWriter which represents an ontology as a directed graph described by Graphviz DOT language syntax.
* Unit tests.

Because it is so small it should be fairly easy to integrate into other stores and APIs.

More information is available here.

RDFStore 0.5

"RDFStore version 0.50 has been released and available on CPAN. This is major release of the toolkit after 3 years - most of the code has been re-written in pure ANSI C and a new native indexing model for RDF has been built in. Several RDF based shallow Semantic Web applications has been successfully built using the toolkit from Asemantics S.r.l and its partners."

"Future developments will include support for upcoming DAWG/BRQL query language, pure C query interface, ODBC and JDBC APIs. Together with a Jena / Java and PHP front-ends."

Available for download.

From the CHANGE LOG: "A brand new indexing model for RDF data has been developed in this release - such a indexing model allows to store quite efficienty "pedantic" RDF descriptions by leveraging on compression of the index of triples using a custom RLE+VLE compression algorithm written in C and XS. It is under investigation the real need to have a custom compression algorithm for RDFStore instead of using one in the public domain; some tests showed that general purpose algorithms like LZO or LZF perfom worse in the specific case of RDFStore where the sparse matrix contains well-known patterns."

Tuesday, August 17, 2004

XSLT for RDF/XML

Rdf Validation Stylesheet "RdfValidationStylesheet is an XSLT 1.0 self contained stylesheet, designed to validate RDF in XML. It can be run as a standalone checker, or can be used to preprocess RDF documents before they are processed with the RdfToTriplesStylesheet. RdfToTriplesStylesheet needs a preprocess validation stage since it does no validation itself."

Rdf To Triples Stylesheet converts RDF/XML to N3.

Monday, August 16, 2004

SW the board game

Glass bead games "A realization of Hesse's Glass Bead Game is presented. By associating small images ("beads" and "tiles") with ideas described in ordinary prose, a new vocabulary of glyphs is developed...In particular, arrangements take the form bead-tile-bead, signifying subject-predicate-object assertions...The entire structure, including narrative, bead phrases, and imagery is represented in the technical forms of the Semantic Web; all beads and tiles are labeled with URIs, and bead phrases become reified RDF. "

The Game: "The Game is a board game, with 1 - 10 players playing on a board with 4 concentric rings of hexagons, representing the developing stages of a relationship: Talk, Know, Like, Trust."

For ages 10-100.

Semantic Worldwide Web

Multilingual Semantic Web work "Following the successful spanish-language workshop in Spain, the SWAD-E project will be taking advantage of an opportunity to participate in a workshop in Argentina in August. This continues the outreach to non-english speaking developers that is one of the goals of the project."

Also from ESW, "An occasional meeting of W3Québec...Material produced for the session included a french translation of Hera, a tool designed to help assess web accesibility and produce results in RDF, as well as a french version of my "Introduction to RDF via CWM" (I really should make an english version, although there is plenty of material in english already)."

Also, the First Italian Workshop on "Semantic Web: Applications and Perspectives".

Sunday, August 15, 2004

Semantic BioWeb

A Semantic Web For Biodiversity Informatics "The Semantic Web extends the existing World Wide Web by structuring and linking information so as to be understandable by machines as well as by humans, facilitating search and discovery. This seminar will explore its potential for biodiversity informatics, drawing upon an NSF-funded informatics project which is using the field of invasive species information management as a testbed. Focal areas of this project include developing ontologies to formally describe the relationship between elements of biological information sets and developing networked species distribution models."

It Just Hasn't Stopped Moving Yet

The RDF Glass Ceiling "In fact, the earlier enthusiasm for RDF in 2001 and 2002 seems to have flickered in 2003, and is now drastically waning in 2004."

From my perspective, RDF never has seen so much activity but maybe that's relative. I didn't really start tracking RDF news until 2002.

JRDF 0.3 Released

JRDF 0.3:
* New in-memory implementation (by Paul),
* Split out NodeFactory into TripleFactory and GraphElementFactory.
* Added Container (Bag, Alternative and Sequence support) and Collection
support.
* Added visitor pattern for typed nodes (URIReference, BlankNode and
Literals).

I found the exact details of Bag, Alternative and Sequence and the issue of list elements versus "_1, _2, ..." a little confusing. For example, Bag and Sequence both have the phrase "possibly including duplicate members" but Alt doesn't include a similar phrase. The description of Alt suggests it doesn't support duplicates, I hope I got it right.

Don't use instanceof

"Anytime you find yourself writing code of the form "if the object is of type T1, then do something, but if it's of type T2, then do something else," slap yourself."

That includes equals (unless you want to inherit equality across an interface - I tend to use casting anyway).

Friday, August 13, 2004

RDF in Plone

Plone as a semantic aggregator "...what if a CMS such as Plone could be turned into a universal content aggregator...Plone’s archetypes is able to import any schema specified in the form of an XMI file output by any UML modelizing editor...let’s build an RDF aggregator product from Plone. This product would retrieve any RDF file from any web site. (It would store it in the Plone’s triplestore called ROPE for instance). It would then retrieve the associated RDF-S file (and store it in the same triplestore)...it would import the RDF data as AT items conforming to the newly created AT content type..."

For the longest time I've hoped someone would take Kowari and plugged it into something like Slide or SnipSnap.

Heuristic Database Integration

IBM gets heuristic in database wars "IBM is preparing to launch enterprise database technologies that can more effectively link together related information from multiple data sources, potentially eliminating some of the quality problems which can plague large data warehousing projects.

The technology, codenamed mineLink and developed at the company's research centre in Almaden, uses heuristic techniques to identify data fields which contain related information even though they may be labelled differently. For instance, a field labelled 'Surname' in one database may be labelled as 'First Name' in another, which can cause problems in integrating the data. While that example is fairly simplistic, matching fields often requires complex analysis of their contents, especially if businesses want to drill further into the collected data.

A prototype of mineLink for use in the life sciences field was demonstrated by IBM researchers as long ago as 2002. That project used existing the DiscoveryLink analytic technologies in DB2, but added additional data mining features in order to provide a unified view of complex information."

They might mean 'Surname' and 'Last Name'.

SOFA Ontology API

SOFA " The SOFA (Simple Ontology Framework API) is a Java API for modeling ontologies with Java language. It integrates ontology engineering tasks with the Java programming by providing an object model of an abstract, language-neutral ontology for Semantic Web applications, Knowledge Bases and other ontology-driven software."

Seems similar in initial concept to the WonderWeb Ontology API. However, it's much simpler with the 4 basic concepts of: Ontology, Thing, Concept and Relation. It seems to sit on top of Jena though, unlike WonderWeb's.

Vivid Display

VIVID "...enables browser-based viewing and publishing of complex OWL and RDF network structures cast into a densely-packed presentation format. Viewers can dynamically change the presentation by use of pivoting and an interactive filtering mechanism."

This is another application using Jena. It's quite easy to drop and and deploy VIVID, it just works. It's an interesting approach where XML or RDF is treated as a nested-column views in a table.

For example, you can pick an RSS 1.0 feed and choose the "Item" as the top-most column. Underneath that will be the title, link, etc. You can modify the recursive depth of the columns, at about level 6 when viewing by "Item" in RSS you get all of the elements displayed, with much repetition of course. Viewing things by "Channel" you get the entire view in about 4 levels.

As with most software there's a couple of bugs:
* It seems rows are duplicated, viewing things by "Channel" seems to repeat all the items twice.
* Some of the items that link/execute "callout.js" don't seem to work.
* The filtering (greater than, less than, etc.) didn't seem to work, at least under Safari or Mozilla.
* Loading the CIA Factbook RDF takes more than a few minutes so it's worthwhile choosing your own (small) set of RDF data.

Thursday, August 12, 2004

Feedback

Your RDF Query Language? "The DAWG has recently released the 2nd draft of its Use Cases and Requirements doc, which we're encouraging people to read and comment on. This document contains an odd baker's dozen of use cases -- little stories where we think a standard RDF query language would help you get the job done. It also includes some requirements and some design objectives. The former are things that we've put into the critical path: we won't be done till they are. The latter are things which we think would be good to do, but which we aren't (yet) willing to put into the critical path. Where do things stand now? The WG is trying to figure out what kind of interest there is from Semantic Web, RDF, XML, and web service developers in a few of its proposed design objectives"

Generally, my feedback so far is "it's all good". 4.6 is what we've started to do in iTQL although it gets tricky when combined with 4.5.1.

Cyc in OWL

Cyc in OWL 60,000 assertions (24 MB). "This file takes approximately 9 hours to load into Protege."

More Databases

IBM Technologist Sees Expanded Role For Databases "In addition to that, though, the definition of a database, at least in my head, is changing dramatically and expanding from the classic structured-data-only kind of database, which is really what you're talking about in terms of Open Source. It's expanding beyond that to manage it in content and touching other data sources and accessing them in place. Those kinds of technologies are very new and not likely to become easy for anybody to build any time soon. We're starting to think of this not so much as a database management system any more but as an information management system."

All RDF all the Time

All Roads Lead to RDF "For my main topic this week I am reaching again into the world of weblogs. Web services have never made for great mailing list discussions, but there are often quite thoughtful pieces to be found on the topic on the Web. One of the more prolific writers on web services has been Mark Nottingham. He is currently employed by BEA and has been an active figure in the development of web services specifications."

"Nottingham's notion perhaps may not be too surprising to XML document-heads who wondered at the bizarre monster that is W3C XML Schema, nor to the semantic webbers who have marveled at the rush to cram all data into XML's tree-shaped structures."

Wednesday, August 11, 2004

Bert and Ernie Teach the Semantic Web

Ernie Puts Away His Toys "Bert: "Look at this mess! Ernie, don't you think it's time to put all these toys away?"

Ernie: (looks up from the fire engine) "Well yes, but I didn't get all the toys out."

Ernie: "Wellllll, how about I put most of the toys away?"

Bert: "Fine. You put most of the toys away, and I'll pick up the rest." (leaves)"

"Bert: "Ernie--ERNIE! I thought you told me you were going to put most of the toys away!"

Ernie: "I am, Bert. I'm putting away all the fire engines,--"

Bert: "Yeah?"

Ernie: "--all the big toys,--"

Bert: "Yeah?"

Ernie: "--all the red toys,--"

Bert: "Ye--" (he suddenly realizes)

Ernie: "--all the toys with wheels, and all the toys with ladders. Yessir, I'm really putting away the toys."

(Ernie marches off, leaving Bert groaning amidst the huge pile of toys still left.)"

Actually, it's more like classification and ontologies...

Friday, August 06, 2004

What's in Kowari 1.0.5?

The tracker page has been updated for the features targets for Kowari 1.0.5.

Already completed:
* N3 parsing and output,
* Pre-fetching and configuration to answer pages (improved RMI support),
* Improved sub query performance,
* Add cache to the globalizing of local nodes (improved result speed),
* Support for a LocalSession if RMI is not enabled, and
* Client/Server JRDF implementation.

Currently being worked on:
* NOT Support,
* KModel Itegration (client/server Jena),
* Cardinality Constraints,
* JRDF 0.3 support,
* A New String Pool, and
* Jena Fastpath Support.

This is scheduled for late August. If all goes well there should be a new CVS update late next week with the currently finished features and NOT; with possibly the new string pool, cadinality and KModel integration.

The Shape of Data

The ‘Document’ in Document-Oriented Messaging "A little while back, I made a direct comparison between the two stacks that the W3C is developing; one based on the Infoset, the other on the RDF data model. It’s pretty clear to me that the RDF data model is simpler; the next step, I think, is to see if and how it (along with OWL) provides the purported benefits of XML, such as nesting, extensibility and versioning. The first of these is pretty easy (it’s a directed graph, so it’s arguably superior); the latter two are beginning to be explored."

The comparison between XML and RDF data models is very good.

Leveraging Information Goo

Making the Most of Data "To leverage unstructured data -- i.e., information that does not or cannot reside in relational databases -- technology must impose organization where none exists. Analogous to tables and schemas in relational databases, taxonomies and metadata organize and categorize the unstructured world."

"As a result, the data discovery and classification process must be easy enough to happen on a constant basis. Moreover, it must occur without an army of library scientists -- small teams of experts who are actually using the information themselves should be able to create the taxonomies, and define whether the topics and categorization schema are meaningful and important."

Linux Gazette on RDF

RDF and the Semantic Web " RDF is a framework for defining metadata; data that describes data. It was developed by the W3C, based on work by Ramanathan V. Guha, and was originally used in Netscape Navigator 4.5's Smart Browsing ("What's related?") feature, and by Open Directory. RDF followed from work Guha had done earlier, both on the Cyc project, and on Apple's Hotsauce project."

A very nice article from the pre-history of RDF to Web-of-trust, Co-depiction and DOAP.

Thursday, August 05, 2004

RDF Data Access Use Cases and Requirements Updated

RDF Data Access Use Cases and Requirements The July 2004 FTF has more details. Includes a link to XQuery-based RDF Query Languages.

Is RDF a graph at all?

This follows from discussions by Paul and Andrae about RDF being a 3-uniform directed hypergraph (or whatever the syntax is for a directed hypergraph where all the elements in the set of edges have a cardinality of 3).

There are two properties that I think prevent RDF being a hypergraph or more generally a graph:
* The edge set cannot be empty.
* That all RDF nodes are vertices.

A graph with an empty set of edges is a 3-uniform hypergraph due to the elements in the empty set having any properties. However, an RDF graph has no way of expressing a set of nodes outside of them being part of a statement. So there's no such thing as a set of RDF nodes (vertices) and an empty set of statements (edges). RDF/XML and N3 serialization and programming APIs like Jena have no way of creating a graph that only consists of a set of nodes.

The other problem, also stems from the fact that nodes don't exist outside statements, and that's blank nodes. These blank nodes don't exist in the set of nodes unless they exist in a statement first. A blank node is specifically there to be a place holder in a statement. What you seem to do in RDF is take a set of set of statements (edges) and place all the unique items into a set of nodes (vertices). This is the opposite approach you take with a graph. The definition of a hypergraph is: "an ordered pair (V, {E}) where V is a set of vertices and {E} is a set of edges such that {E} is a subset P(V) (power set of V)."

So RDF seems to really be a network - an application of a graph. I think this is part of the confusion people have when they try to visualize RDF. If I was to characterise it: RDF is statement/edge centric and graphs are node/vertex centric. So a good visualization is one of statements rather than nodes - so the approach should be more faceted than ball-and-stick.

Copyright not patents

Open Source Against Software Patents "t is worth taking a look at the history of databases to understand why copyrights, rather than patents, are the right form of protection.

In 1970, Edgar Codd, an IBM computer scientist, wrote a number of papers which developed the idea of a new form of relational database which went beyond the then current hierarchical and network database models. Codd's paper was instrumental in the development of IBM's prototype relational database known as System R (and ultimately in the development of IBM's DB2 database), as well as Oracle, the first commercial relational database. Codd's paper was read by computer scientists Michael Stonebraker and Eugene Wong at the University of California at Berkeley, who subsequently developed the Ingres database. System R and Ingres in turn inspired the development of virtually all commercial relational databases, including those from Sybase, Informix, Tandem, and even Microsoft's SQL Server. SQL (Structured Query Language) became a de facto standard as well as an official standard published by ANSI in 1986 and ratified by ISO in 1987. And then in the 1990s, we saw the emergence of open source databases such as MySQL, Postgres, Firebird, and others. All of these databases benefit from the development of standards that ensure compatibility and interoperability...Every time you search on the Internet, shop online, or book a reservation, you're using database technology. The database industry has grown successfully because it was not locked up in proprietary patents.

Of course, the history of database technology is hardly unique. The same story could be told about spreadsheets, word processors, e-mail systems, graphical user interfaces, electronic shopping carts, search engines, and even the Internet itself. Tim Berners-Lee, the inventor of the World Wide Web, says software patents have "run amok.""

Wednesday, August 04, 2004

Continuations

Native Java Continuations "...all of these things can be done without continuations in the JVM, they just get more convenient when you do have them. In the small is where the ideas really shine -- coroutines, generators, certain types of graph traversals, error recovery (in languages which support continuations you get java's try/catch being a trivial thing to implement, and you get other niceties like "try ... catch { retry two times; fail } finally ... end"."

Hmm, graph traversal.

Stupid Laws

Click at your own risk "Anyone who has copied songs from a CD onto an iPod or computer hard drive has fallen foul of Australian copyright laws, which critics argue are failing to keep pace with technological change. Copying music for personal use is generally OK in the US and Europe. But not in Australia."

"Songwriters and publishers want to change the law and pay for the copies through levies on digital music players and blank CDs. The record labels - which own the recordings - want the law to stay."

Also, mentions the free trade agreement (FTA) with America and bringing DMCA to Australian, EFA has their submission about the FTA's effects. Slashdot has Australia to Get Software Patents and Anti-Circumvention Laws. Andrew Tridgell, author of Samba, has attacked the FTA too.

Tuesday, August 03, 2004

Setting SAIL

A new Sail for Sesame "This document describes the design of a new Sail for Sesame. [The] Goal for this Sail implementation is to offer a scalable and fast persistent repository for RDF data that does not need third-party applications like databases. More specifically: one should not need to install any additional software to be able to use this Sail. This does not include LGPL-compatible, embeddable Java components. Main reasons for coming up with a new Sail are that the currently available memory Sail isn't scalable enough when limited memory is available, and the RDBMS Sail is both complicated to install and too slow in aspects like adding and removing statements."

Kowari is actually a svelt 2.4MB (or so) when Jena, Jetty, and everything else is removed. Without Jena, however, you can't parse RDF. If you are only interested in storing longs that gets down to about 1.5MB. I meantioned Paul's earlier ideas on various persistent triple stores.

Monday, August 02, 2004

OS Cloudscape

IBM to make Java database open source "Big Blue is expected to detail the open-source initiative, code-named Derby, according to a source familiar with IBM's plans. The software will be governed by the open-source Apache Software License and stewarded by the Apache Software Foundation, the source said."

"Cloudscape is a niche product in IBM's overall data information line and has tiny market share compared with its multibillion-dollar DB2 franchise. IBM has used Cloudscape as an embedded data store as part of its Workplace desktop application line."

"The decision to release Cloudscape into open source mimics moves by other proprietary software companies, which have created open-source projects around existing products in an effort to generate more interest in the product and make it easier for programmers to access it. At LinuxWorld next week, Computer Associates International will release its Ingres r3 database, a product with limited market share, into open source."

"Putting an existing product into open source is not a surefire recipe for stimulating usage or sales, said Michael Olson, president and CEO of Sleepycat Software, which offers its own open-source database."

Google and the Semantic Web

How Google Will Have Achieved The Semantic Web This is old and has been contradicted by Google founders themselves:

"He [Sergey Brin] basically said he doesn't believe in the semantic web as a set of linked RDF data-structures. His basic argument is that the structure of natural language and what it presents is much much richer than meta-data tagging schemes. Clearly, Google's understanding of natural language is unique, but there still is a need for machine readable APIs for data on the Internet."

The ideas that Paul Ford puts forward are pretty interesting and illustrate the usefulness of some of the ideas of the Semantic Web; except maybe the reliance on a central authority like Google. The Semantic Web is like the Web, not like Google.

Also, I (well Google) found a new link to "The Anatomy of a Large-Scale Hypertextual Web Search Engine" written by Sergey Brin and Lawrence Page for WWW7.

July 2004 issue of SIGSEMIS

I knew this was coming, but my lack of an internet connection meant I couldn't check it or read it. The next issue is ready for download in PDF format.

Includes an interview with Eric Miller:
"Freeing the data from the applications that created them and managing this information directly relates to a strong return on investment. The predominant skepticism I hear is perhaps the most is 'if I have XML why do I need RDF'. It's interesting however to see some of the skepticism dissipates after organizations learn from experiences (often times painful ones) that agreement on syntactic conventions are often overly brittle and not adequate for the effective management of data."

"The WWW2004 conference had a similar impact on me with regards to the Semantic Web. The technologies and toolkits are maturing. Semantic Web applications are becoming far more prevalent. Novel ideas for how these technologies may be used are happening on a daily basis. It was quite a week!"

Danny Ayer's "The Missing Webs":
"...there is a lot missing from the current Web. Those gaps can be filled in part using a logic-based framework."

It also includes lots of KM based articles, the ones I found interesting: "The Road Ahead to Competency-Based Learning Activity Selection: A Semantic Web Perspective", "Reflection on the future of knowledge portals", "Reflection on the future of knowledge portals", and "Methodologies for the Semantic Web: state-of-the-art of ontology methodology".

The book review, "Developing Semantic Web Services", has a link to Semantic Web Author a "...Multi-Markup Language (XML, RDF, and OWL) Validating Parser, Editor, and Web Development Environment."

What real hackers do

Great Hackers "If companies want hackers to be productive, they should look at what they do at home. At home, hackers can arrange things themselves so they can get the most done. And when they work at home, hackers don't work in noisy, open spaces; they work rooms with doors. They work in cosy, neighborhoody places with people around and somewhere to walk when they need to mull something over, instead of in glass boxes set in acres of parking lots. They have a sofa they can take a nap on when they feel tired, instead of sitting in a coma at their desk, pretending to work. There's no crew of people with vacuum cleaners that roars through every evening during the prime hacking hours. There are no meetings or, God forbid, corporate retreats or team-building exercises. And when you look at what they're doing on that computer, you'll find it reinforces what I said earlier about tools. They may have to use Java and Windows at work, but at home, where they can choose for themselves, you're more likely to find them using Perl and Linux."

Yeah, real programmers go home and boot up Linux and write some really neat regexes in Perl. Yeah ;-).