Friday, April 06, 2007


ETech '07 Summary - Part 2 - MegaData.

Here's the thing, we need a new kind of data store, a new kind of SQL, something that does for storing and querying large amounts of data what SQL did for normalized data.

Sure you can store a lot of data in a relational database, but when I say large, I mean really large; a billion or more records. I know we need this because I keep seeing people build it.

All this talk about making SPARQL behave like SQL maybe for nothing if people realize that's not what they need after all.

The back of the envelope scalability for an RDF store would be potentially 100s of billions of statements.

The key requirements highlighted are: distributed, joinless (no referential integrity at the store level), denormalized and transactionless.

I was aware of this because a comment linked to one of my previous posts about Kowari scalability (which I must of snuck through at some stage). Kowari got up to 10,000 triples/second later on its life.
Post a Comment