Adversarial information retrieval
Encyclopedia
Adversarial information retrieval (adversarial IR) is a topic in information retrieval
related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.
On the Web, the predominant form of such manipulation is search engine spamming
(also known as spamdexing), which involves employing various techniques to disrupt the activity of web search engines, usually for financial gain. Examples of spamdexing are link-bombing
, comment or referrer spam, spam blog
s (splogs), malicious tagging. Reverse engineering of ranking algorithms
, advertisement blocking
, and web content filtering may also be considered forms of adversarial data manipulation.
Activities intended to poison the supply of useful data make search engines less useful for users. If search engines are more exclusionary they risk becoming more like directories and less dynamic.
Other topics:
(then Chief Scientist at Alta Vista) during the Web plenary session at the TREC
-9 conference.
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.
On the Web, the predominant form of such manipulation is search engine spamming
Spamdexing
In computing, spamdexing is the deliberate manipulation of search engine indexes...
(also known as spamdexing), which involves employing various techniques to disrupt the activity of web search engines, usually for financial gain. Examples of spamdexing are link-bombing
Google bomb
The terms Google bomb and Googlewashing refer to practices, such as creating large numbers of links, that cause a web page to have a high ranking for searches on unrelated or off topic keyword phrases, often for comical or satirical purposes...
, comment or referrer spam, spam blog
Spam blog
A spam blog, sometimes referred to by the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads....
s (splogs), malicious tagging. Reverse engineering of ranking algorithms
Ranking function
In information retrieval, a ranking function is a function used by search engines to rank matching documents according to their relevance to a given search query....
, advertisement blocking
Ad filtering
Ad filtering or ad blocking is removing or altering advertising content in a webpage. Advertising can exist in a variety of forms including pictures, animations, text, or pop-up windows. More advanced filters allow fine-grained control of advertisements through features such as blacklists,...
, and web content filtering may also be considered forms of adversarial data manipulation.
Activities intended to poison the supply of useful data make search engines less useful for users. If search engines are more exclusionary they risk becoming more like directories and less dynamic.
Topics
Topics related to Web spam (spamdexing):- Link spam
- Keyword spamming
- CloakingCloakingCloaking is a search engine optimization technique in which the content presented to the search engine spider is different from that presented to the user's browser. This is done by delivering content based on the IP addresses or the User-Agent HTTP header of the user requesting the page...
- Malicious tagging
- Spam related to blogs, including comment spam, splogsSpam blogA spam blog, sometimes referred to by the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads....
, and ping spamSpingSping is short for "spam ping", and is related to pings from blogs using trackbacks, called trackback spam. Pings are messages sent from blog and publishing tools to a centralized network service providing notification of newly published posts or content...
Other topics:
- Click fraudClick fraudClick fraud is a type of Internet crime that occurs in pay per click online advertising when a person, automated script or computer program imitates a legitimate user of a web browser clicking on an ad, for the purpose of generating a charge per click without having actual interest in the target...
detection - Reverse engineering of a search engineSearch engineA search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
's rankingRankingA ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher than', 'ranked lower than' or 'ranked equal to' the second....
algorithm - Web content filteringContent filteringContent filtering is the technique whereby content is blocked or allowed based on analysis of its content, rather than its source or other criteria. It is most widely used on the internet to filter email and web access.- Content filtering of email :...
- Advertisement blockingAd filteringAd filtering or ad blocking is removing or altering advertising content in a webpage. Advertising can exist in a variety of forms including pictures, animations, text, or pop-up windows. More advanced filters allow fine-grained control of advertisements through features such as blacklists,...
- Stealth crawling
- Malicious tagging or voting in social networks
History
The term "adversarial information retrieval" was first coined in 2000 by Andrei BroderAndrei Broder
Andrei Zary Broder is a Research Fellow and Vice President of Emerging Search Technology for Yahoo!. He previously has worked for AltaVista as the vice president of research, and for IBM Research as a Distinguished Engineer and CTO of IBM's Institute for Search and Text Analysis.Broder's research...
(then Chief Scientist at Alta Vista) during the Web plenary session at the TREC
Text Retrieval Conference
The Text REtrieval Conference is an on-going series of workshops focusing on a list of different information retrieval research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology and the Intelligence Advanced Research Projects Activity , and began in 1992...
-9 conference.
External links
- AIRWeb: series of workshops on Adversarial Information Retrieval on the Web
- Web Spam Challenge: competition for researchers on Web Spam Detection
- Web Spam Datasets: datasets for research on Web Spam Detection