Friday, November 07, 2003

Storing RDF

Workshop on Semantic Web Storage and Retrieval - Position Papers.

The Indexing and retrieving Semantic Web resources: the RDFStore model from ASemantics:

"The number of 'sub queries' needed to satisfy a single RDF query is often several orders of magnitude larger than commonly seen in RDBMS applications. Also, very often to save space DBAs design tables using significant number of indirect references, deferring to the application, or a stored procedure layer, for expanding the operation into large numbers of additional join operations just to store, retrieve or delete a single atomic statement from the database and maintaining consistency."

"Such indexes map the RDF nodes, contexts and free-text words contained into the literals to statements. There are several advantages to this approach. First, the use of a hybrid run-length and variable-length encoding to compress the indexes makes the resulting data store much more compact. Second, the use of bitmaps and Boolean operations allows matching arbitrary complicated queries with conjunction, disjunction and free-text words without using backtracking and recursion techniques. Third, this technique gives fine-grained control over the actual database content."

I'm fairly sure that using hashing for databases is the wrong approach, using something that guarantees unique identifiers, like a node pools and string pools, seems much more sensible:
"For efficient storage and retrieval of statements and their components we assume there exist some hash functions which generates a unique CRC64 integer number for a given MD5 or SHA-1 cryptographic digest representation of statements and nodes of the graph."

Also of interest, Prolog-based RDF storage and retrieval (optimising a store for rdfs:subPropertyOf relation) and Jena position paper

No comments: