Monday, July 30, 2007

Scale Out or Drop Out

I recently came across this study done by IBM using Nutch/Lucene to compare scale-up (a single, shared memory, fast server) vs scale-out (a cluster). It showed that for the type of work performed scale-out systems outperformed scale-up systems in terms of price and performance. The IEEE article about Google's architecture in late 2002, notes that Google was spending $278,000 for 88 dual CPU 2GHz Xenons, 2 GB of RAM and each having a 80 GB hard drive. In the IBM paper, $200,000 gets you 112 blades, quad processor with 8GB of memory each with a 73GB drive as well as a shared storage system.

This scale-out approach was also recently mentioned by Tim O'Reilly's, MySQL: The Twelve Days of Scaleout.

Scale-up x Scale-out: A Case Study using Nutch/Lucene

The Nutch/Lucene search framework includes a parallel indexing operation written using the MapReduce programming model [2]. MapReduce provides a convenient way of addressing an important (though limited) class of real-life commercial applications by hiding parallelism and fault-tolerance issues from the programmers, letting them focus on the problem domain.

The query engine part consists of one or more front-ends, and one or more back-ends. Each back-end is associated with a segment of the complete data set...The front-end collects the response from all the back-ends to produce a single list of the top
documents (typically 10 overall best matches).

We see that the peak performance of the scale out solution (BladeCenter) is approximately 4 times better.

No comments: