Tuesday, April 10, 2007

Lucene for the Semantic Web

Google's [WWW] Bigtable, a distributed storage system for structured data, is a very effective mechanism for storing very large amounts of data in a distributed environment.

Just as Bigtable leverages the distributed data storage provided by the [WWW] Google File System, Hbase will provide Bigtable-like capabilities on top of Hadoop.

Data is organized into tables, rows and columns, but a query language like SQL is not supported. Instead, an Iterator-like interface is available for scanning through a row range (and of course there is an ability to retrieve a column value for a specific key).

Any particular column may have multiple values for the same row key. A secondary key can be provided to select a particular value or an Iterator can be set up to scan through the key-value pairs for that column given a specific row key.

From the Hbase/HbaseArchitecture page:
HBase uses a data model very similar to that of Bigtable. Users store data rows in labelled tables. A data row has a sortable key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have crazily-varying columns, if the user likes.

A column name has the form ":

The example tables given are very similar to untyped relations. This has only just become part of the nightly build.

Via, Data Parallel.
Post a Comment