Computer science researchers at Stanford University have developed several new techniques for calculating Web-page rankings (as used in the Google search engine) five times faster.
Google currently ranks and searches 3 billion Web pages using a ranking algorithm called Computing PageRank. The program, however, can take several days to search and rank a billion Web pages. To speed the process, Stanford researchers developed a trio of techniques in numerical linear algebra. One uses extrapolation techniques, which make some assumptions about the Web's link structure that aren't true, but still permit quick and easy computation of PageRank. Because the assumptions are false, PageRank isn't completely accurate. However it is close enough to be refined using the original PageRank algorithm. Researchers have shown that their extrapolation techniques can speed up PageRank by 50%.
Another method, BlockRank, relies on a feature of the Web's link structure. The Stanford team found that approximately 80% of the pages on any given Web site point to other pages on the same site. That means they can compute many single-site PageRanks, glue them together, and use that as a starting point for the original PageRank algorithm. This could make PageRank computation three times faster.
Other Stanford findings show the rankings for some pages are calculated early in the PageRank process, while others of highly rated pages take much longer to compute. A method called Adaptive PageRank eliminates redundant computations associated with those pages whose PageRanks finish early. This speeds computation by as much as 50%. "Further speed-ups are possible when we use all these methods," says one Stanford researcher. Still, researchers say other issues must be solved. "We are closer to a topic-based PageRank than to a personalized ranking," he says.
Complex personalized rankings need even greater speed-ups to the PageRank calculations. And while a faster algorithm shortens computation time, storage problems remain. The results from a single PageRank computation on a few billion Web pages need several gigabytes of storage. Saving a limited number of topic-specific PageRank calculations is more practical, say researchers. For more information, go to the Stanford Database Group's Publication Server at http://dbpubs.stanford.edu/.