Showing posts with label relational model. Show all posts
Showing posts with label relational model. Show all posts

Monday, December 13, 2010

What's a Dual?

Erik Meijer has been giving a talk at YOW (in Melbourne) and San Francisco at QCON about Key Value Stores being the dual of SQL.

The obvious question is what is a dual? And like most really good questions you already know the answer. Set operations AND and OR are dual. Which means that if A OR F = A is true, by replacing AND with OR and T with F (or as Erik said switching the arrows), you get A AND T = A. Likewise, A OR T = T becomes A AND F = F.

This is the bit where Maths almost becomes magical. Why should these properties hold? Why doesn't this work for multiplication and addition?

In his talk, Erik quoted Benjamin Black, that the noSQL movement took away the relational model and gave nothing back. His talk also seems to be about the beginnings of a noSQL/coSQL algebra and a project to write a noSQL database within Microsoft (noSQL Server perhaps?). He's not the only one thinking about these kinds of things obviously.

He also mentioned, closure as an important property of numbers and other things. That is you plug 1 + 1 in, you get 2 and you can plug 2 into + 1 and get 3 and so on - the answer of one operation can be used for the next. You can't do that with SQL or SPARQL, which in my mind is a big fail. Erik agrees - and suggested that noSQL (or coSQL) should support this property. He's not the only one to think that noSQL should support the Unix philosophy (which is not just closure but is an important part of it). And he's certainly not going to be the last.

He described these properties in the same way as I discovered them in RDF, which was through graphs. The object model in LINQ has a graph of objects - objects which don't have a globally identifiable name - much like blank nodes. The cool thing about LINQ, well one of the many cool things, is that it supports closure - you query your object graph, you get back another object graph and can continue to plug away at it as you narrow your query. This certainly makes a lot of sense from a user querying data perspective (among others) - where you start with all students, you ask for all male students, then you ask for all male students with a grade point average about 5 or whatever. Narrower and narrower is one typical way people find the answer they're looking for.

The other thing he mentions is that SQL is a nightmare due to normalization, the complexity of queries, null semantics and computing over them and the difference between objects and tables. Erik showed that objects and relations are dual.

This is the bit where I introduce monads and say it's like pizza delivery or something. Screw that. It's just good in that if your operations follow laws then they can be mapped to other things and inherit any work done to make them good - like optimisation or scaling. In Erik's talk that meant finding out cool things like noSQL being dual to SQL and LINQ can be done as MapReduce (see this slide of Erik's). This works because LINQ operations are associative and MapReduce operations are also associative. It's not just a good idea it's a monadic law. If you can, read 10 pages on category theory (I'm only up to page 8 though).

Obviously, not everything is peaches. The noSQL/coSQL people say don't do joins - they don't scale - you can't self join on the web for example. In a MapReduce/noSQL system you either end up querying multiple times or you end up processing the data again. I'd suggest that someone will write a library that does multiple queries and joins over noSQL databases so that you don't have to do it yourself - maybe locally and in memory at first, and then with disk based structures for scalability later - I guess this is where LINQ would fit in.

There were lots of dualing properties in the relational vs noSQL/coSQL talk. I only remember a few SQL vs noSQL/coSQL: closed world vs open world, identity by value vs identity by reference (or properties), and (I might be wrong) transactional vs non-transactional.

So with our arrows in hand, ignoring how cool the world will be without JOINs and UNIONs and our mighty knowledge of set theory, what would JOIN and UNION look like if they were a dual and if such a thing existed as 0 and U for relations. U JOIN R = R becomes 0 UNION R = R and likewise U UNION R = U becomes 0 JOIN R = 0 (which matches my existing prejudices).

Just to put the cherry on this pig of a blog post (to mix metaphors), objects and functions look like duals too (given that people claim there's no meaning behind what an object is and what a dual is I feel safe anyway):

But from another perspective, the apply "method" of a closure can be used as a low-level method dispatch mechanism, so closures can be, and are, used to implement very effective objects with multiple methods...With closures seen as a building block with which to implement objects, it's clear that objects are a poor man's closures...If your language only has these restricted closures, and you're forced to build an object system on top of them, it's clear that closures are a poor man's objects.

Erik via Eugene Wigner said there's an unreasonable effectiveness of mathematics and I have to agree. Monads, duals, properties for LINQ, noSQL, relational databases, SPARQL, and so on. Very unreasonable and very cool.


Sunday, March 02, 2008

Algebra A and SPARQL

I've been reading, "Logic and Databases: The Roots of Relational Theory" and more importantly the chapter 10 which is about "Why is it called a Relational Algebra?". He defines what an algebra is such as identities, idempotence, absorption and so on with respect to Algebra A. I first came across Algebra A in the 3rd Manifesto which is an untyped relational algebra that defines a relationally complete system in about three operations: REMOVE, NOR or NAND and TCLOSE (transitive closure). I say, "about three" because I'm not sure TCLOSE is part of a relationally complete system and NOR and NAND are made up of an untyped OR or AND plus NOT. It also gets rid of WHERE, EXTEND and SUMMARIZE (which can be used for aggregate functions) by creating "relational operators" which are special relations that perform an operation (like COUNT).

Anyway, one of the more interesting points is that on page 260-261 of "Logic and Databases" he talks about identities such as: A + 0 = A * U = A, A + U = U and A * 0 = 0. Where A is any relation, 0 is the empty relation (DUM) and U is the universal relation (DEE). These match the tables I created for JOIN and UNION for SPARQL - and likewise I think are correct for OPTIONAL.

There is also a chapter on the closed world assumption and why Date dislikes the open world assumption which I'm still trying to digest. It seems that to get around 3VL Date uses strings - which seems like a massive hack.

It also occurred to me when reading this that SPARQL and query languages in general are non-monotonic - that is as you add more information the results you get from a query can be different - which is different to RDF. It made me wonder what a monotonic query language would look like but not for too long.

Friday, November 16, 2007

SPARQL isn't Unix Pipes

It wasn't supposed to be this way. I was just trying to get what I wrote in 2004 acknowledged. All I wanted then, as now, was aggregate functions and a matching data model. I did a bit more research in 2006 (mostly in my spare time while I had a day job to go to) and thought that people could read it and understand it. I even spent some time over the last Christmas holidays making a more gentler introduction to it all.

SPARQL is a proposed recommendation - which is one step away from being a published standard. So, I put my objections in to the SPARQL working group. From what I can tell people either didn't understand or thought that I was some kind of weirdo. The unhappiest part of this is the summary of my objection, "The Working Group is unfamiliar with any existing query languages that meet the commenter's design goals."

All I wanted was closure of operations, where the query language matches the data model it queries. Maybe this is a very odd thing to want. No one seems to know what the relational model really is either. Maybe it's a bad example.

Maybe a better example is Unix pipes. Unix pipes have operations (sort, cut, etc.) that take in text and output text. That is, it takes the same input as output or something known as closure. So you can take input from one tool and string them together in any order you want. Sometimes it's more efficient to do one operation first over another. In SPARQL you can't do that as the first operation of every query turns it into variable bindings.

I was hoping that SPARQL would be the Unix pipes of RDF. This would mean that the operations like join, filter, restrict (matching triples) and so on take in an RDF graph (or graphs) and output an RDF graph (or graphs). This gives tremendous flexibility in that you can create new operations that all work on the same model. It also means that a lot of the extra complexity that is part of SPARQL (for example, CONSTRUCT and ASK) go away.

This is not to say that SPARQL doesn't have value and shouldn't be supported. It is just a missed opportunity. It could have avoided repeating mistakes made with SQL (like not standarizing aggregate functions, having a consistent data model and so on).

Update: I re-read this recently. It struck me that maybe I was being a little unclear about what I expected as input and output in the RDF pipes view of SPARQL. Really, it's not a single RDF graph per se that are being processed but sets of triples. It's not really a big difference - as RDF graphs are just sets of triples but it's more that the triples being processed don't have to come from one graph. There's no restriction on what I'm talking about above to process, in one go, triples from many different graphs. The criticism is the same though - SPARQL breaks triples into variable bindings. Having multiple graph processing (or sets of triple processing) just requires the graph that the triple came from recorded (the quad in most systems). It certainly something that could be added to JRDF's SPARQL implementation.

Friday, March 16, 2007

Radiant RDF

A new and hopefully readable article on XML.com, "A Relational View of the Semantic Web". The main reason is to illicit some feedback and to make a more understanble version of things like this. It also discusses whether or not to use blank nodes when mapping RDF to the relational model (which is something I first published in my thesis). It includes my take, which is no doubt influenced by Simon and working on Kowari, on the DISTINCT issue amongst others.

Update: Well if any feedback is good feedback, having positive feedback is awesome: "But it begs the question: given the relative proximity of RDF/SPARQL to the relational model, why are the Semantic Web standards not closer both syntactically and semantically to it? This is, for me, the major reason which prevents its adoption...Nice to see such a straightforward and well-referenced article outlining the Semantic Web and relational worlds in any case. Thank you."

The question raised in the comment, why not keep pushing SQL, is one I wish I had a short answer for. Often SPARQL syntax has been chosen to look like SQL. But obviously it's still different enough because it is a different domain. I still favour a clean break from SQL for that reason.

I wrote about SQL not being relational enough quite a while ago now (nearly 3 years). At the time, I think I was unaware of some of the operations in SQL like INTERSECT and EXCEPT (set difference).

Update 2: Also noted by Nova Spivacks, "Excellent Overview of Benefits of RDF and SPARQL".

Thursday, February 15, 2007

All Annotations

JSR-311: a Java API for RESTful Web Services?

I am still not so sure what they want to do exactly, but it seems to be based on JRA which has annotations of the form:

@Get
@HttpResource(location="/customers")
public List getCustomers();

For me a more interesting JSR would be one that would standardise the @rdf annotations the way so(m)mer is doing it.


The RESTlet framework was created because:

"I noted the lack of a REST framework in Java. The only project that came close to it was 1060 NetKernel developed by 1060 Research but it had too many features for my needs and I found that the mapping of REST concepts was not as direct as I was expecting."


It seems that annotations are an exceedingly useful tool, especially having used Elmo's annotations of interfaces to generate beans from RDF (link to PDF documentation). Annotating interfaces rather than classes, seems to be the way to go, given that interfaces can represent relationships like multiple inheritance. Elmo was inspired by subject oriented programming that provides a way "to group common behaviours and separate these different roles into unique interfaces and reusable classes." It's especially useful for unplanned extensions (through composition).

Googling for the various projects brought up, Tripresso looking at mapping RDF to OO (including Java, Ruby, Python and PHP projects). If object to relational mapping is the Vietnam of computer science, I wonder what object to RDF mapping is.

See the previous discussion, RDF on Rails ( mentioning ActiveRDF, RdfReactor, etc) and Henry Story's original, Java Annotations & the Semantic Web.

Saturday, February 03, 2007

Relational SPARQL?

ARQ : what next? "What next? This release matches the published DAWG working draft but the working group is considering a SPARQL algebra the formalization of the semantic of SPARQL to be based on some like the relational algebra."

Wednesday, January 10, 2007

A Quick Overview of Relational SPARQL

I've put a quick overview of the relational SPARQL operations that I developed in JRDF onto the Google Code Wiki, called Relational SPARQL Operations. This is just a reworking of what I've done in my thesis in hopefully a more digestable way. I hope I'll have time to expand it and the JRDF Wiki generally to include more documentation on things like the other features of JRDF.

Wednesday, December 20, 2006

Some Design Issues

Semantic Web Road map "It is clearly important that the query language be defined in terms of RDF logic. For example, to query a server for the author of a resource, one would ask for an assertion of the form "x is the author of p1" for some x. To ask for a definitive list of all authors, one would ask for a set of authors such that any author was in the set and everyone in the set was an author. And so on."

Relational Databases on the Semantic Web "Is the RDF model an entity-relationship mode? Yes and no. It is great as a basis for ER-modelling, but because RDF is used for other things as well, RDF is more general. RDF is a model of entities (nodes) and relationships. If you are used to the "ER" modelling system for data, then the RDF model is basically an openning of the ER model to work on the Web. In typical ER model involved entity types, and for each entity type there are a set of relationships (slots in the typical ER diagram). The RDF model is the same, except that relationships are first class objects: they are identified by a URI, and so anyone can make one. Furthurmore, the set of slots of an object is not defined when the class of an object is defined."

Linked Data "The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data."

"So statements which relate things in the two documents must be repeated in each. This clearly is against the first rule of data storage: don't store the same data in two different places: you will have problems keeping it consistent. This is indeed an issue with browsable data. A set of of completely browsable data with links in both directions has to be completely consistent, and that takes coordination, especially if different authors or different programs are involved."

Thursday, December 14, 2006

Relational OWL

As a sequel (not SQL) to relational SPARQL there's relational OWL, "In this paper we analyzed the similarities and differences between relational databases and description logics, in particular with respect to the role of schema constraints. Our analysis reveals more similarities than differences since, in both cases, constraints are just (restricted) first-order theories. Furthermore, reasoning about the schema is in both systems performed under standard first-order semantics, and it even employs closely related reasoning algorithms. The differences between relational databases and description logics become apparent only if one considers the problems related to reasoning about data. In relational databases, answering queries and constraint satisfaction checking correspond to model checking, whereas, in description logics they correspond to entailment problems."

Via All Problems Solved.

Monday, December 11, 2006

The Difference is Antijoin

I made a mistake about making a mistake. I got caught up with the fact that SPARQL's diff is not the same as relational difference but that's not how left outerjoin is defined. So I'm correct that relational set difference is not compatbile with SPARQL's difference as defined in "The SPARQL Algebra" but wrong that this means the definitions for left outer join are not equivalent.

So antijoin is:
( R1 difference (project(R1)(R1 join R2)) )

So using the previous example:
{{ (?x = 2, ?y = 3) }} \ {{ (?y = 3) }} is {} in SPARQL.

Now, as I said using plain difference it's:
{{ (?x = 2, ?y = 3) }} - {{ (?y = 3) }} is {{ (?x = 2, ?y = 3) }} in relational algebra.

But using antijoin the right hand side is really:
  • project ({{ (?x = 2, ?y = 3) }}) ({{ (?x = 2, ?y = 3) }} join {{ (?y = 3) }})
  • project ({{ (?x = 2, ?y = 3) }}) ({{ (?x = 2, ?y = 3) }})
  • {{ (?x = 2, ?y = 3) }}. Which matches the left hand side.

This is why JRDF looked like it was producing the right result - it was doing the same thing - it's just the operations were further decomposed. I'm sure I've done this before but for some reason I was certain I was wrong the other day. I was told that antijoin and SPARQL's diff were equivalent when it was changed but I had to question it.

There still remains an outstanding issue around null joins but maybe that's not needed either. I know that the NULLs in "SPARQL RULES!" are required for Logic Programs but it doesn't need to be there for relational algebra as long as you have the null accepting/untyped join.

So SPARQL in JRDF looks fine again.

Wednesday, December 06, 2006

JRDF GUI Takedown

The other day I decided to remove the JRDF GUI from Sourceforge. Subsequently, there has been a need to download it again. :-) It's here for the time being:
http://jrdf.sf.net/jrdf-gui-0.3.jar

Like I've said, none of the operations implemented in JRDF match SPARQL's. In order not to throw away the work I've done I'm going to spend sometime soon renaming it and changing the syntax (to be more like Tutorial D). The name might be UQL (Unknown Query Language), RRQL (Relational RDF Query Language) or DRQL (tutorial D like RDF Query Language). Any suggestions?

Sunday, December 03, 2006

SPARQL Favours the Brave

SPARQL RULES! "Now, as opposed to [17], we define three notions of compatibility between substitutions:
  • Two substitutions O1 and O2 are bravely compatible (b-compatible) when for all x <- dom(O1) Intersection dom(O2) either xO1 = null or xO2 = null or xO1 = xO2 holds. i.e., when O1 Union O2 is a substitution over dom(O1) Union dom(O2).
  • Two substitutions O1 and O2 are cautiously compatible (c-compatible) when they are b-compatible and for all x <- dom(O1) Intersection dom(O2) it holds that xO1 = xO2.
  • Two substitutions O1 and O2 are strictly compatible (s-compatible) when they are c-compatible and for all x in dom(O1)Intersection dom(O2) it holds that x(O1 Union O2) /= null.
"

They make the conclusion that only c-joins operate correctly with idempotency for join and, "Following the definitions from the SPARQL specification, in fact, the b-joining semantics is the only admissible definition, which is why [17] does not consider null values at all. There are still advantages for gradually defining alternatives towards traditional relational algebra treatment. On the one hand, as we have seen in the examples above, the brave view on joining unbound variables might have partly surprising results, on the other hand, as we will see, the c- and s-joining semantics allow for a more effective implementation in terms of Datalog rules."

I'm surprised at how badly I've understood the proposed SPARQL algebra, I didn't think that joining on nulls was being proposed (see Example 2.4 on page 9 of the PDF). Again, this conflicts with relational algebra and JRDF's implementation.

A join in relational algebra does match the suggested s-compatible one, you can't join on NULLs (because they are considered unbound values, they aren't "real", they don't exist and aren't equalable values). As shown in the paper, it wouldn't successfully join a null name and would return only the row containing "Alice".

In SQL's 3VL logic, "NULL literally means that the value is unknown or indeterminate. One side effect of the indeterminate nature of NULL value is it cannot be used in a calculation or a comparision." This means you can't join across NULLs in SQL or use them for aggregate functions except for COUNT. NULLs are also handled differently across SQL implementations. One example that I've come across before is the handling of strings and NULL values between DB2 and Oracle.

It reminds me how much I hate IEEE citations (I think that's the standard) where you use [17] for the reference to Perez in this paper and it's [22] in another. What's wrong with [Perez2006] or something?

Anyway, they also suggest: a set different or minus operator (which I've suggested SPARQL would be incomplete without even though you can do it with FILTER and iTQL has had it for a while before that), nested queries using ASK and using SPARQL as a rules language.

This will probably be my last post on SPARQL for a while - I'm a bit sick of continually, publicly displaying my ignorance (which I guess has occurred over the last year). I could change JRDF to fit any of the proposed implementations (and every other permutation) but I'm not sure it makes sense. I feel very far away from understanding what's going on and I doubt I will (or should) be commenting about it too much in the future.

Update: I seem to have confused s and c semantics. I've updated the paragraph related to relational algebra - it does match s-compatibility.

Saturday, December 02, 2006

Breaking Project

SPARQL Basic Graph Pattern Matching Andy writes, "Example data:
  :a :p 1 .
:a :p 2 .
Example query pattern:
 { ?x :p [] }
How many answers? Blank nodes are existential variables in RDF; named variables (the regular ?x ones) are universal variables. Queries don't return the binding of an existential; queries can return the binding of a named variable and the bound value of a named variables can be passed to other parts of the query (FILTERs, OPTIONALs etc) via the algebra.

In the absence of a DISTINCT in the query, are the solutions to this pattern:
  • 1 : ?x = :a
  • 2 : ?x = :a , ?x = :a
  • Either 1 or 2
"
I would say there is a fourth option here, see this example, which would be:
  • { { ?x:subject = :a, p1:predicate1 = :p, o1:object1 = 1 } , { ?x:subject = :a, p1:predicate1 = :p, o1:object = 2 } }
The DAWG test case, rdfSemantics-bNode-type-var, redefines the meaning of project.

If you keep track of where the variable were bound you can still count the distinct instances without redefining project.

Thursday, November 30, 2006

Different Strokes

I feel I should make a correction to comments I made in regards to the proposed SPARQL algebra.

In that post and in my thesis I've said that the definitions of OPTIONAL by Pérez et al (and used by Andy Seaborne) are compatible with the relational definitions I used. I'm now about as certain as I can be that that's not the case.

The definition in Pérez et al is for set difference is (in ASCII):
O1 \ O2 = {u <- O1 | for all u' <- O2, u and u' are not compatible}.

The key here is the definition of compatibility, I didn't find it clear in the original paper (my fault and I'll explain a bit more about this later) but it's explicit in "Semantics of SPARQL" where it says, "Two mappings with disjoint domains are always compatible, and the empty mapping u0 is compatible with any other mapping. Intuitively, u1 and u2 are compatibles if u1 can be extended with u2 to obtain a new mapping, and vice versa."

A relational join is defined as the set union of the headings and matching values. The heading consists of the attributes (X, Y, Z), where X is the attributes of the left hand side relation, Y contains the attributes that match the left and right hand side relations and Z are the attributes of the right hand side relation. The body consists of tuples with the values matching X, Y, and Z (Date writes it as { X x, Y y, Z z}). I think that this is compatible with the Pérez et al definition.

But it's different for difference. The difference operator in relational algebra requires the relations to be the same type to be removed (this is the definition I've used). I'll use "\" to represent SPARQL's set difference and "-" to represent the relational difference.

For example:
{{ (?x = 2, ?y = 3) }} \ {{ (?y = 3) }} is {} in SPARQL
{{ (?x = 2, ?y = 3) }} - {{ (?y = 3) }} is {{ (?x = 2, ?y = 3) }} in the modified Galindo-Legaria relational algebra that JRDF uses.

The reason it is unchanged in relational algebra is because the definition of equality requires them to be the same type (the same attributes). So a binding with one value can't be equal to a binding with two values. The Pérez et al definition is a much looser definition and many more matches are made.

As an aside, I was fairly sure that set difference was a requirement for OWL inferencing. I might be wrong here too. If it is the case, though, I'd think it'd be nice (an efficienct use of operations) if SPARQL operations could be reused for OWL ones. This would allow the difference operator by itself to be expressed in SPARQL too.

So I think I understand the issue now. At the moment I'm trying to work out how JRDF gets what seems to be the right results because it doesn't really look like it should. The other thing that was in my paper was a mapping to SQL and I'm not sure how a SQL based SPARQL implementation would work either (based on the definitions I used). Lots of questions anyway.

I am trying to work out whether this definition of compatibility is a good one. At the moment I don't like it or dislike it because I don't understand yet.

The trigger was the removal of the term antijoin from the SPARQL algebra - that is a good idea because I understood it to mean it was compatible with relational algebra (using the same terms I guess). Up until that point I'd assumed difference in SPARQL was compatible with difference in relational algebra.

Update: Paul mentions in the comments, that the SPARQL difference operator is the one needed for OWL. Another win for that definition then too.

It's apparent that the relational model really isn't appropriate to model SPARQL in. I don't think there's a single definition of an operation in relational algebra that hasn't been changed.

Update 2: I've since come to realize that joins are not compatible either. This is due to the suggestion that you can join on NULL values. This contradicts relational algebra (taking either the Date or Codd approach on NULL) and SQL joins (not that I think that's worth much but it's easy to demonstrate).

Update 3: Last update for this blog I hope. While relational difference is not equivalent, antijoin is equivalent. Which is what I was told when antijoin was renamed diff in first place. More information, "The Difference is Antijoin".

Thursday, November 09, 2006

Mr Sparkle

Just posted, "Applying the relational model to SPARQL" to the DAWG comments list. It describes the work done adding SPARQL support (or small bits of it anyway) to JRDF. A HTML version to follow.

Direct link (PDF ~500K).

BTW, if anyone notices any grammatical errors please let me know and I'll fix them right up. I get to the point where I can't read what I've written anymore so there's bound to be a few errors.

Update: I've updated it with some small typos fixed, formatting and the example relations (some conversions to S1 -> Supplier1 that I missed) were fixed up.

Update 2: The HTML version is now available. Many thanks to Eric Prud'hommeaux for the work he did converting the original PDF.

Wednesday, November 08, 2006

Why Semijoin is Better than Join

Query Processing in a System for Distributed Databases (SDD-1) "We prefer semijoins to joins for three reasons. First, Ri(A = B) Rj subsumes Ri, and so semijoins monotonically reduce the size of the database. By contrast, joins can increase the size of the database; in the worst case...Second, semijoins can be computed with less intersite data transfer than joins...we need only transmit a projection of a relation...the third advantage of semijoins is that the “reductive effect” of any single join can be attained by two semijoins, usually at lower cost..."

This is one of the first times, in many years, where I had to resort to using another search engine other than Google (Yahoo) to find this paper.

Monday, September 25, 2006

JRDF SPARQL Performance

I did some performance gathering of JRDF's implementation of SPARQL using some of the FOAF data from Mindswap.

Average for Query 1, 100,000 triples:
* Jena (ARQ 0.9.2) - 14685 ms
* JRDF and JRDF using Tuple Subsumption - 3652 ms

There is only one UNION implementation in JRDF.

Average for Query 2, 100,000 triples:
* Jena (ARQ 0.9.2) - 22872 ms
* JRDF - 6615 ms
* JRDF using Tuple Subsumption - 3750 ms

Average for Query 3, 100,000 triples:
* Jena (ARQ 0.9.2) - 15306 ms
* JRDF - 8019 ms
* JRDF using Tuple Subsumption - 4780 ms

The point is not that it was faster that Jena (although yay!), it's that the relational optimisations had a positive effect on querying speed. The current downloadable version of the JRDF SPARQL GUI (0.2) has very slow versions of hash code and equals methods for performance sensitive classes like AttributeValuePair. It still showed the benefit of the optimisations but made it 10 times slower to answer the queries. The modified version of these classes is only available in the JRDF subversion repository.

Thursday, September 21, 2006

A Better Way

There's a couple of paper's I've been reading about fusing functional programming and relational model and the influential "Can Programming Be Liberated From the von Neumann Style? A Functional Style and its Algebra of Programs".

Also we have Dijkstra's, "On the cruelty of really teaching computing science". Firstly, he states that people are by nature conservative and often use previous experience as the basis to learn something new, "...radical novelties are so disturbing that they tend to be suppressed or ignored, to the extent that even the possibility of their existence in general is more often denied than admitted."

So computing is generally quite different but people continue to use the wrong metaphors and analogies: "The practice is pervaded by the reassuring illusion that programs are just devices like any others, the only difference admitted being that their manufacture might require a new type of craftsmen, viz. programmers. From there it is only a small step to measuring "programmer productivity" in terms of "number of lines of code produced per month". This is a very costly measuring unit because it encourages the writing of insipid code, but today I am less interested in how foolish a unit it is from even a pure business point of view. My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

Besides the notion of productivity, also that of quality control continues to be distorted by the reassuring illusion that what works with other devices works with programs as well. It is now two decades since it was pointed out that program testing may convincingly demonstrate the presence of bugs, but can never demonstrate their absence. After quoting this well-publicized remark devoutly, the software engineer returns to the order of the day and continues to refine his testing strategies, just like the alchemist of yore, who continued to refine his chrysocosmic purifications."

The solution, is teaching students correctly, that the basics is not to be found Pascal or C or Java but in logic and mathematics: "Right from the beginning, and all through the course, we stress that the programmer's task is not just to write down a program, but that his main task is to give a formal proof that the program he proposes meets the equally formal functional specification. While designing proofs and programs hand in hand, the student gets ample opportunity to perfect his manipulative agility with the predicate calculus. Finally, in order to drive home the message that this introductory programming course is primarily a course in formal mathematics, we see to it that the programming language in question has not been implemented on campus so that students are protected from the temptation to test their programs. And this concludes the sketch of my proposal for an introductory programming course for freshmen."

Monday, September 18, 2006

Unicode Maths Font

Does Your Browser Support Multi-language? is potentially the only available source of Miscellaneous Mathematical Symbols-A which has the left outer join symbol. I downloaded about half a dozen different fonts and none of them had them. Many thanks to Simon.

Wednesday, September 06, 2006

Relational.OWL

Relational.OWL - A Data and Schema Representation Format Based on OWL "In this paper we introduce a Web Ontology Language (OWL)-based (Miller & Hendler 2004) representation format for relational data and schema components, which is particularly appropriate for exchanging items among remote database systems. OWL, originally created for the Semantic Web enables us to represent not only the data itself, but also its interpretation, i.e. knowledge about its format, its origin, its usage, or its original embedment in specific frameworks.

Hence, remote databases are instantly able to understand each other without having to arrange an explicit exchange format - the usage of OWL on both sides is sufficient. This would be impossible using present XML formats."

"Bizer introduced in (Bizer 2003) a mapping language between relational data and RDF, particularly between specific relational query results and RDF. Contrary to our approach, D2R MAP converts the stored data into ”real” RDF objects, i.e. an address would be represented as a RDF address ob ject. This approach takes into account, that the original database cannot be reconstructed using this kind of data representation anymore, since it does not contain information concerning the original schema of the database. As a result, the data represented with the D2R MAP language looses its relationship to the original database. Tracing the data to its original storage position is thus hardly possible."

Homepage, other papers (such as "Database to Semantic Web Mapping using RDF Query Languages" and the good "Bringing Relational Data into the Semantic Web using SPARQL and Relational.OWL") and documentation with beta versions of the software written in Java (uses Jena).

D2R Server 0.3 was release recently too.