Thursday, March 09, 2006

No Snappy Title

I posted this to the JRDF list but it hasn't come up on the archives - so it's here as well. I'm using this for a current project I'm doing at Uni - I don't know if it's a good topic but at least it's one I can complete in a sane amount of time and isn't completely reliant on working code.

Here's a brief description of the changes that I've made to the relational layer of JRDF. Basically, it now tries to closely follow the concepts of relations and tuples. So far, I think it's closer than previous attempts such as, "A relational algebra for SPARQL". One of the ideas that I keep coming back to is that having duplicates are indicative of using bags/multisets not sets - and RDF is all about sets. There's a whole stream of research on the power of bag languages ("Query Languages for Bags", which claims bag oriented languages can't do transitive closure for example) that I've yet to look at more fully.

Relational Tuples

Components:
  • Type name - integer, char, sno, name.
  • Attribute Name - status, city, sno, sname.
  • Attributes - status:integer, char:city, sno:sno, sname:name
  • Attribute:Value - sno sno('s1'), sname name('smith'), status 20, city 'london'
  • Heading - sno sno, sname name, status integer, city char.

Proposed Types for RDF

Basic interface:
  • isAssignableFrom - return true if the object is a super-type of the given type. Similar to Java's and Rel's.
  • getName - the name of the type.

Type hierarchy:
  • Object -> Subject -> Predicate. Meaning that Object is a super-type of Subject, which is a super-type of Predicate. This allows joining columns of different but compatible types.
  • URI Reference, Literal and BNode are all incompatible types of each other - you won't be able to join these.

They are all nodes. In the future this will allow selecting certain types or certain operations to be performed only on certain types.

Proposed JRDF Tuples

Components:
  • Types - subject, predicate, object, uri, literal, bnode. As defined above.
  • Attribute name - variable name or default name.
  • Attribute - s?:subject, P1:predicate, O1:object, P2:predicate, ?p:object, P3:predicate, ?city:object
  • Attribute:Value - s?:subject(#s1), P1:predicate(#name), O1:object('smith'), p?:predicate(#p1)
  • Heading - s? subject, P1 predicate, O1 object.


Proposed JRDF Relation

Components:
  • Heading/Attributes - set of attributes.
  • Body/Tuples - set of tuples


An aspect of this is that the heading of the relation doesn't modify the type in the attribute of the tuple. This means you always know the position of where in the graph the value came from. This used to bug me in Kowari/TKS that the underlying layers didn't know this information.

Example

Graph:
S1:subjectP1:predicateO1:object
s1#snos1
s1#spp1
s1#spp2
s2#spp1
s2#spp2
p1#city'London'
p1#city'Paris'

Query:


select ?sno ?pno ?city
...
where ?sno #sno s1
?sno #sp ?pno
?pno #city ?city

First Relation:

?sno:subjectP1:predicateO1:object
s1#snos1

Second Relation:

?sno:subjectP2:predicate?pno:object
s1#spp1
s1#spp2
s2#spp1
s2#spp2

Third Relation:

?pno:subjectP3:predicate?city:object
p1#city'London'
p2#city'Paris'

First and Second:

?sno:subjectP1:predicateO1:objectP2:predicate?pno:object
s1#snos1#snop1
s1#snos1#snop2

First and Second and Third:

?sno:subjectP1:predicateO1:objectP2:predicate?pno:objectP3:predicate?city:object
s1#snos1#snop1#city'London'
s1#snos1#snop2#city'Paris'

After Project:

?sno:subject?pno:object?city:object
s1p1'London'
s1p2'Paris'

No comments: