What's needed is a facility for searching MIT web pages. Being able to type "kerberos" and find pages that mention it is much easier than wandering around the web.
Searching is a much-needed service for users of MIT's CWIS. It's important to look at options for providing this service and choose one that will best meet the need.
There are hundreds of possible solutions, but they fall into four basic categories.
Advantage: requires no development effort.
Disadvantage: adds dependence outside MIT.
Advantage: there are plenty to choose from.
Disadvantage: possible detriment to server performance.
The detriment to server performance results from the fact that normal operation of the web server depends on use of the AFS cache. Many clients tend to request the same pages, which web.mit.edu retrieves from local disk rather than from an AFS server. Robots tend to flush the cache by asking for all pages in the tree, one after another.
Advantage:
Who wrote Harvest? See the press release.
The central parts of the Harvest system are the gatherer, which compiles a summary of information in web pages, and the broker, which collects that information from the gatherer and builds a searchable index. One uses the broker when doing a search. Read Distributing the Gathering and Brokering Processes to learn how Harvest can be configured to gather either across the network or from local disk.
There's a Technical Discussion of the Harvest System that explains other design decisions in Harvest that make it scalable. For example, there's a system for efficiently replicating indexing information, so that searches aren't bottlenecked by one server.
You can try my experimental broker through this query page.