Human Computer Information Retrieval

Human–computer information retrieval (HCIR) is the study of information retrieval

Information retrieval

Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

techniques that bring human intelligence into the search

Search engine

A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

process. The fields of human–computer interaction

Human–computer interaction

Human–computer Interaction is the study, planning, and design of the interaction between people and computers. It is often regarded as the intersection of computer science, behavioral sciences, design and several other fields of study...

(HCI) and information retrieval (IR) have both developed innovative techniques to address the challenge of navigating complex information spaces, but their insights have often failed to cross disciplinary borders. Human–computer information retrieval has emerged in academic research and industry practice to bring together research in the fields of IR and HCI, in order to create new kinds of search systems that depend on continuous human control of the search process.

History

This term human–computer information retrieval was coined by Gary Marchionini] in a series of lectures delivered between 2004 and 2006.[4] Marchionini’s main thesis is that "HCIR aims to empower people to explore large-scale information bases but demands that people also take responsibility for this control by expending cognitive and physical energy."

In 1996 and 1998, a pair of workshops at the University of Glasgow

University of Glasgow

The University of Glasgow is the fourth-oldest university in the English-speaking world and one of Scotland's four ancient universities. Located in Glasgow, the university was founded in 1451 and is presently one of seventeen British higher education institutions ranked amongst the top 100 of the...

on information retrieval

Information retrieval

and human–computer interaction

Human–computer interaction

sought to address the overlap between these two fields. Marchionini notes the impact of the World Wide Web

World Wide Web

The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

and the sudden increase in information literacy

Information literacy

The National Forum on Information Literacy defines information literacy as “...the ability to know when there is a need for information, to be able to identify, locate, evaluate, and effectively use that information for the issue or problem at hand.” This is the most common definition; however,...

– changes that were only embryonic in the late 1990s.

A few workshops have focused on the intersection of IR and HCI. The Workshop on Exploratory Search, initiated by the University of Maryland Human-Computer Interaction Lab

University of Maryland Human-Computer Interaction Lab

The Human-Computer Interaction Lab at the University of Maryland, College Parkdesigns, implements, and evaluates new interface technologies that are universally usable, useful, efficient and appealing to a broad cross-section of people. To this end, the HCIL develops advanced user interfaces and...

in 2005, alternates between the Association for Computing Machinery

Association for Computing Machinery

The Association for Computing Machinery is a learned society for computing. It was founded in 1947 as the world's first scientific and educational computing society. Its membership is more than 92,000 as of 2009...

Special Interest Group on Information Retrieval

SIGIR is the Association for Computing Machinery's Special Interest Group on Information Retrieval. The scope of the group's specialty is the theory and application of computers to the acquisition, organization, storage, retrieval and distribution of information; emphasis is placed on working with...

(SIGIR) and Special Interest Group on Computer-Human Interaction

CHI (conference)

The ACM Conference on Human Factors in Computing Systems series of academic conferences is generally considered the most prestigious in the field of human–computer interaction. It is hosted by ACM SIGCHI, the Special Interest Group on computer–human interaction. CHI has been held annually since...

(CHI) conferences. Also in 2005, the European Science Foundation

European Science Foundation

The European Science Foundation is an association of 78 member organisations devoted to scientific research in 30 European countries. It is an independent, non-governmental, non-profit organisation that facilitates cooperation and collaboration in European research and development, European...

held an Exploratory Workshop on Information Retrieval in Context. Then, the first Workshop on Human Computer Information Retrieval was held in 2007 at the Massachusetts Institute of Technology

Massachusetts Institute of Technology

The Massachusetts Institute of Technology is a private research university located in Cambridge, Massachusetts. MIT has five schools and one college, containing a total of 32 academic departments, with a strong emphasis on scientific and technological education and research.Founded in 1861 in...

What is HCIR?

HCIR includes various aspects of IR and HCI. These include exploratory search

Exploratory search

Exploratory search is a specialization of information exploration which represents the activities carried out by searchers who are either:[1]* a) unfamiliar with the domain of their goal * b) unsure about the ways to achieve their goals * c) or even unsure about their...

, in which users generally combine querying and browsing strategies to foster learning and investigation; information retrieval in context (i.e., taking into account aspects of the user or environment that are typically not reflected in a query); and interactive information retrieval, which Peter Ingwersen defines as "the interactive communication processes that occur during the retrieval of information by involving all the major participants in information retrieval (IR), i.e. the user, the intermediary, and the IR system."[2]

A key concern of HCIR is that IR systems intended for human users be implemented and evaluated in a way that reflects the needs of those users.[5]

Most modern IR systems employ a ranked

Ranking

A ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher than', 'ranked lower than' or 'ranked equal to' the second....

retrieval model, in which the documents are scored based on the probability

Probability

Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

of the document’s relevance

Relevance

-Introduction:The concept of relevance is studied in many different fields, including cognitive sciences, logic and library and information science. Most fundamentally, however, it is studied in epistemology...

to the query.[6] In this model, the system only presents the top-ranked documents to the user. This systems are typically evaluated based on their mean average precision over a set of benchmark queries from organizations like the Text Retrieval Conference

Text Retrieval Conference

The Text REtrieval Conference is an on-going series of workshops focusing on a list of different information retrieval research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology and the Intelligence Advanced Research Projects Activity , and began in 1992...

(TREC).

Because of its emphasis in using human intelligence in the information retrieval process, HCIR requires different evaluation models – one that combines evaluation of the IR and HCI components of the system. A key area of research in HCIR involves evaluation of these systems. Early work on interactive information retrieval, such as Juergen Koenemann and Nicholas J. Belkin

Nicholas J. Belkin

Nicholas J. Belkin is a professor at School of Communication, Information and Library Studies at Rutgers University. Among the main themes of his research are digital libraries; information-seeking behaviors; and interaction between humans and information retrieval systems.Dr...

’s 1996 study of different levels of interaction for automatic query reformulation, leverage the standard IR measures of precision and recall but apply them to the results of multiple iterations of user interaction, rather than to a single query response.[3] Other HCIR research, such as Pia Borlund’s IIR evaluation model, applies a methodology more reminiscent of HCI, focusing on the characteristics of users, the details of experimental design, etc.[1]

Goals

Marchionini put forth the following goals towards a system where the user has more control in determining relevant results:[4]

Systems should aim to get people closer to the information they need, especially to the meaning; that is, systems can no longer only deliver the relevant documents, but must also provide facilities for making meaning with those documents.

Systems should increase user responsibility as well as control; that is, information systems require human intellectual effort, and good effort is rewarded.

Systems should have flexible architectures so they may evolve and adapt to increasingly more demanding and knowledgeable installed bases of users over time.

Systems should aim to be part of information ecology of personal and shared memories and tools rather than discrete standalone services.

Systems should support the entire information life cycle (from creation to preservation) rather than only the dissemination or use phase.

Systems should support tuning by end users and especially by information professionals who add value to information resources.

Systems should be engaging and fun to use.

In short, Marchionini seems to expect information retrieval systems to operate in the way that good libraries do. Systems should help users to bridge the gap between data or information (in the very narrow, granular sense of these terms) and knowledge (processed data or information that provides the context necessary to inform the next iteration of an information seeking process). That is, good libraries provide both the information a patron needs as well as a partner in the learning process-—the information professional—-to navigate that information, make sense of it, preserve it, and turn it into knowledge (which in turn creates new, more informed information needs); the HCIR process is cyclical in the same way, and aims to improve the whole of a user's information seeking experience.

Techniques

The techniques associated with HCIR emphasize representations of information that use human intelligence to lead the user to relevant results. These techniques also strive to allow users to explore and digest the dataset without penalty, i.e., without expending unnecessary costs of time, mouse clicks, or context shift.

Many search engines have features that incorporate HCIR techniques. Spelling suggestion

Spelling suggestion

Spelling suggestion is a feature of hi computer software applications used to suggest plausible replacements for words that are likely to have been misspelled....

s and automatic query reformulation

Query expansion

Query expansion is the process of reformulating a seed query to improve retrieval performance in information retrieval operations.In the context of web search engines, query expansion involves evaluating a user's input and expanding the search query to match additional documents...

provide mechanisms for suggesting potential search paths that can lead the user to relevant results. These suggestions are presented to the user, putting control of selection and interpretation in the user’s hands.

Faceted search enables users to navigate information hierarchically

Hierarchy

A hierarchy is an arrangement of items in which the items are represented as being "above," "below," or "at the same level as" one another...

, going from a category to its sub-categories, but choosing the order in which the categories are presented. This contrasts with traditional taxonomies

Taxonomy

Taxonomy is the science of identifying and naming species, and arranging them into a classification. The field of taxonomy, sometimes referred to as "biological taxonomy", revolves around the description and use of taxonomic units, known as taxa...

in which the hierarchy of categories is fixed and unchanging. Faceted navigation

Faceted classification

A faceted classification system allows the assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways, rather than in a single, predetermined, taxonomic order. A facet comprises "clearly defined, mutually exclusive, and collectively exhaustive...

, like taxonomic navigation, guides users by showing them available categories (or facets), but does not require them to browse through a hierarchy that may not precisely suit their needs or way of thinking.[7]

Lookahead

Lookahead

Lookahead is a tool in algorithms for looking ahead a few more input items before making a cost effective decision at one stage of the algorithm.- Lookahead in search problems :...

provides a general approach to penalty-free exploration. For example, various web applications employ AJAX

Ajax (programming)

Ajax is a group of interrelated web development methods used on the client-side to create asynchronous web applications...

to automatically complete query terms and suggest popular searches. Another common example of lookahead is the way in which search engines annotate results with summary information about those results, including both static information (e.g., metadata

Metadata

The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

about the objects) and "snippets" of document text that are most pertinent to the words in the search query.

Relevance feedback

Relevance feedback

Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query and to use information about whether or not those results are relevant to perform a new query...

allows users to guide an IR system by indicating whether particular results are more or less relevant.[8]

Summarization and analytics

Analytics

Analytics is the application of computer technology, operational research, and statistics to solve problems in business and industry. Analytics is carried out within an information system: while, in the past, statistics and mathematics could be studied without computers and software, analytics has...

help users digest the results that come back from the query. Summarization here is intended to encompass any means of aggregating

Aggregate data

In statistics, aggregate data describes data combined from several measurements.In economics, aggregate data or data aggregates describes high-level data that is composed of a multitude or combination of other more individual data....

or compressing

Data compression

In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

the query results into a more human-consumable form. Faceted search, described above, is one such form of summarization. Another is clustering, which analyzes a set of documents by grouping similar or co-occurring documents or terms. Clustering allows the results to be partitioned into groups of related documents. For example, a search for "java" might return clusters for Java (programming language)

Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

, Java (island)

Java

Java is an island of Indonesia. With a population of 135 million , it is the world's most populous island, and one of the most densely populated regions in the world. It is home to 60% of Indonesia's population. The Indonesian capital city, Jakarta, is in west Java...

, or Java (coffee).

Visual representation of data

Information visualization

Information visualization is the interdisciplinary study of "the visual representation of large-scale collections of non-numerical information, such as files and lines of code in software systems, library and bibliographic databases, networks of relations on the internet, and so forth".- Overview...

is also considered a key aspect of HCIR. The representation of summarization or analytics may be displayed as tables, charts, or summaries of aggregated data. Other kinds of information visualization

Information visualization

that allow users access to summary views of search results include tag clouds and treemapping

Treemapping

In information visualization and computing, treemapping is a method for displaying hierarchical data by using nested rectangles.- Main idea :...

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.