Subscribe by EmailTop Posts Today
Top Ranked Users
Invite your peers
|
Ever wonder how Google and Yahoo! can process millions of queries a day and get results back to you in usually less than a second. One tool they both use is called "Hadoop" . Hadoop is a software tool that runs on small to very large compute clusters (>2000 nodes). Hadoop is able to take an application and divide it into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. Each bit of data is processed separately and then brought back together using a Map/Reduce methodology. Hadoop goes a step further by providing its own distributed file system that stores data from each application across the various nodes within the cluster allowing for even greater aggregate bandwidth. This system allows for very high bandwidth and helps mitigate failures in a system. If a node within the cluster fails the job just gets submitted again to run on another node within the cluster. Compute nodes can be distributed among campuses which allows a large aggregate of compute resources to work together to perform work together. I can see this tool being used to do more than just search. It could be applied to other applications that need high performance computing. I can see a lot of bioinformatics/cheminformatics, oil discovery services, and financial based application programs being able to take advantage of such a framework. Hadoop would probably fall into the Grid computing category. See this post on the advantages of using a Grid model. It is free and available from Apache.org. Vassilios _____________________ Vassilios If you enjoyed this posting please subscribe to our RSS feed or submit it to your favorite social networks. Reply |
User loginNavigationKey word tags |
Recent comments
1 week 1 day ago
13 weeks 3 days ago
14 weeks 21 hours ago
14 weeks 21 hours ago
14 weeks 2 days ago
14 weeks 2 days ago
14 weeks 4 days ago
15 weeks 3 days ago
15 weeks 3 days ago
15 weeks 4 days ago