Wednesday, September 11, 2002

Scale This

DirectConnect is claiming that they have reached one petabyte of data (in March this year) online although their current stats shows more than three. This is apparently larger than other P2P networks combined. Lucky for DirectConnect they maybe brought down. The architecture has a central hub, surrounded by hubs and clients. The schema and search protocol look fairly boring no neat schemas or namespaces just hardcoded file extensions.

Three of the better pieces of software for it:

The deep web (as of 2001) was estimated to be roughly 7.5 petabytes in size, 400 to 550 times larger, and is growing faster than the surface web:

The article excludes the whole P2P thing in its calculations, the shallow/deep web being separate. For some perspective Internet traffice in the US for the month of May reached 100 petabytes. Some surface web sizes include 100 terabytes for the Wayback machine and 1 petabyte at Google.

To some extent, all this excludes high bandwidth communities like Mildura in Victoria:
Post a Comment