TrustRank
Encyclopedia
TrustRank is a link analysis
technique described in a paper by Stanford University
and Yahoo!
researchers for semi-automatically separating useful webpages from spam
.
Many Web spam pages are created only with the intention of misleading search engine
s. These pages, chiefly created for commercial reasons, use various techniques to achieve higher-than-deserved rankings
on the search engines' result pages
. While human experts can easily identify spam, it is too expensive to manually evaluate a large number of pages.
One popular method for improving rankings is to increase artificially the perceived importance of a document through complex linking schemes. Google
's PageRank
and similar methods for determining the relative importance of Web documents have been subjected to manipulation.
TrustRank method calls for selecting a small set of seed pages to be evaluated by an expert. Once the reputable seed pages are manually identified, a crawl extending outward from the seed set seeks out similarly reliable and trustworthy pages. TrustRank's reliability diminishes with increassed distance between documents and the seed set.
The researchers who proposed the TrustRank methodology have continued to refine their work by evaluating related topics, such as measuring spam mass
.
Link Analysis
In network theory, link analysis is a data-analysis technique used to evaluate relationships between nodes. Relationships may be identified among various types of nodes , including organizations, people and transactions...
technique described in a paper by Stanford University
Stanford University
The Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...
and Yahoo!
Yahoo!
Yahoo! Inc. is an American multinational internet corporation headquartered in Sunnyvale, California, United States. The company is perhaps best known for its web portal, search engine , Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Groups, Yahoo! Answers, advertising, online mapping ,...
researchers for semi-automatically separating useful webpages from spam
Spamdexing
In computing, spamdexing is the deliberate manipulation of search engine indexes...
.
Many Web spam pages are created only with the intention of misleading search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
s. These pages, chiefly created for commercial reasons, use various techniques to achieve higher-than-deserved rankings
Search engine optimization
Search engine optimization is the process of improving the visibility of a website or a web page in search engines via the "natural" or un-paid search results...
on the search engines' result pages
Search engine results page
A search engine results page , is the listing of web pages returned by a search engine in response to a keyword query. The results normally include a list of web pages with titles, a link to the page, and a short description showing where the Keywords have matched content within the page...
. While human experts can easily identify spam, it is too expensive to manually evaluate a large number of pages.
One popular method for improving rankings is to increase artificially the perceived importance of a document through complex linking schemes. Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
's PageRank
PageRank
PageRank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set...
and similar methods for determining the relative importance of Web documents have been subjected to manipulation.
TrustRank method calls for selecting a small set of seed pages to be evaluated by an expert. Once the reputable seed pages are manually identified, a crawl extending outward from the seed set seeks out similarly reliable and trustworthy pages. TrustRank's reliability diminishes with increassed distance between documents and the seed set.
The researchers who proposed the TrustRank methodology have continued to refine their work by evaluating related topics, such as measuring spam mass
Spam mass
Spam mass is defined as "the measure of the impact of link spamming on a page's ranking." The concept was developed by Zoltán Gyöngyi and Hector Garcia-Molina of Stanford University in association with Pavel Berkhin and Jan Pedersen of Yahoo!...
.
See also
- PageRankPageRankPageRank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set...
- CheiRankCheiRankThe CheiRank is an eigenvector with a maximal real eigenvalue of the Google matrix G^* constructed for a directed network with the inverted directions of links. It is similar to the PageRank vector, which ranks the network nodes in average proportionally to a number of incoming links being the...
- Adversarial information retrievalAdversarial information retrievalAdversarial information retrieval is a topic in information retrieval related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source...
- Hilltop algorithmHilltop algorithmThe Hilltop algorithm is an algorithm used to find documents relevant to a particular keyword topic. Created by Krishna Bharat while he was at Compaq Systems Research Center and George A. Mihăilă, then at the University of Toronto, it was acquired by Google in February 2003...
- SpamdexingSpamdexingIn computing, spamdexing is the deliberate manipulation of search engine indexes...
External links
- Z. Gyöngyi, H. Garcia-Molina, J. Pedersen: Combating Web Spam with TrustRank
- Link-based spam detection Yahoo! assigned patent application using Trustrank
- TrustRank algorithm explained