Scaling Hadoop to 4000 nodes at Yahoo! "The 4000-node cluster throughput was 7 times better than 500’s for writes and 3.6 times better for reads even though the bigger cluster carried more (4 v/s 2 tasks) per node load than the smaller one." They did some performance monitoring of the cluster and achieved sorting 6TB of data in 37 minutes.
Update: Ex-Google and Yahoo employees create start-up (called Cloudera) and Hadoop Primer by Sun (PDF).
Update: Hadoop users group meeting with IBM talking about different join techniques and Cloudbase project which is an SQL abstraction over log files (tutorial PDF).
Update: Microsoft's first project is Hadoop.
No comments:
Post a Comment