Sunday, January 23, 2005

Using RDF to improve Object-First Development

There are many ways to create systems, including a "relational-first" approach and an "object-first" approach.

The example in these two pieces are two classes: Person and Employee. Person has first name, last name and age. Employee also has an ID and salary. How do you model these objects?

"One approach would be to create two tables, PERSON and EMPLOYEE, and use a foreign-key relationship to tie rows from one to the other. This will require a join between these two tables every time we want to work with a given Employee, which requires greater work on the part of the database on every query and modification to the data. We could store both Person and Employee data into a single EMPLOYEE table, but then when we create Student (extending Person), and want to find all Persons whose last name is Smith, we'll have to search both STUDENT and EMPLOYEE tables, neither of which at a relational level have anything to do with one another. And if this inheritance layer gets any deeper, we're just compounding the problem even further, almost exponentially."

"As if the above weren't enough, more frequently than not, the enterprise developer doesn't have control over the database schema--it's one that's already in use, either by legacy systems or other J2EE systems, or the schema has been laid down by developers in other groups. So even if we wanted to build a table structure to elegantly match the object model we built above, we can't arbitrarily change the schema definitions."

To approach this problem with an RDF based system is to create extensions to these schemas - this is not a problem and is almost expected. RDF specific databases, like Kowari, can avoid joining to represent these two objects - everything is triples. Subjects that are Person classes can be made Employee classes by adding two statements: saying the subject has an ID and salary.

Another advantage to an RDF system, especially when you start using ideas like resolvers, is to enable developers to integrate existing data and to create application specific database schemas. All data can be converted and adapted to the view required by the application. Filling out objects as they need to be viewed, potentially allowing a Person object to be viewed as Employee just with missing values. This can be achieved by applying a specific semantic for missing values (not recorded, not available, etc) or inferring missing details based on other data. You can of course, approach this normally and only show objects that are valid without modification. The point is, the flexibility is there.

Using this approach avoids both the "smaller than an object" and "smaller than a row (relation)" problems. The first by creating ad-hoc objects and the second because each item in RDF can be individually picked up, at the triple level, and used.

There's a third way mentioned, "Procedural-first" creates an encapsulation layer which takes the Session facade approach where the actual way in which data in persisted is abstracted.

Apart from the normal suspects (mentioned in the above articles) I've recently come across two related ideas, there are quite a few:
* JoSQL which is like SQL meets resolvers.
* SQLObject is an object-relational mapper. It allows you to translate RDBMS table rows into Python objects, and manipulate those objects to transparently manipulate the database.

Update: Danny wrote up a schema using RDFS to represent the classes.

No comments: