Sunday, October 09, 2005

JRDF 0.3.4.1 Out

Now available for download.

Some thoughts related to this:
* To ensure valid refactorings and bug fixes it seems sensible to go back and do some things properly. It means that the next highest priority will have to be writing an NTriples parser to validate the RDF Test cases. The SableCC grammar for NTriples is already done.
* While this is a good start, it's not nearly fine grained enough to actually write the code and have fast running unit tests. Tom has done some of this test driving a parser with his SPARQL work although I suspect I may do it a little differently (using EasyMock).
* Graph equality and isomorphism on graphs is a pain - which is to say that it's complicated with blank nodes (see the test cases). In "Matching RDF Graphs it lists possible ways to do mappings, including a blank node may map to a labelled node. I know that in Kowari loading the sample FOAF files twice into the same graph adds the blank nodes twice. While this seems like a mistake it is the fastest way to load a graph. Maybe implementing different loading modes (no duplicate checking, blank node to blank node and blank node to labelled node) and signing grahs are two ways to help this.

2 comments:

Paula said...

...in Kowari loading the sample FOAF files twice into the same graph adds the blank nodes twice. While this seems like a mistake it is the fastest way to load a graph.

I'm glad you only said "seems like a mistake", because this behaviour is required by the RDF semantics. Note that it is entirely legal to map two (or more) blank nodes to the same name.

Of course, it is desirable to automatically find which blank nodes are the same as each other (owl:sameAs). In fact, I think it would be great if we could eliminate all redundant blank nodes. After all, the semantics allow for an infinite number of distinct blank nodes for every named node in the system... which would cause significant problems when stored in a real database.

Andrew said...

I'm just going to show how I misunderstand both the RDF and Ruby (and other languages) here...

Getting duplicate blank nodes from the same statements from the same file "seems like a mistake" to me is a lot like "duck typing". It walks like a duck, it quacks like a duck - it is a duck.

I guess I'm just arguing that returning "infinity" to a count of statements on a graph with blank nodes does not seem quite right.

Ideally, it means that it isn't the default behaviour in APIs and query languages.

Unfortunately, this behaviour is also easiest thing to implement.