Andrew McCallum
Encyclopedia
Andrew McCallum is a professor and researcher in the computer science
department at University of Massachusetts Amherst
. His primary specialties are in machine learning
, natural language processing
, information extraction
, information integration
, and social network analysis.
McCallum graduated summa cum laude from Dartmouth College
in 1989. He completed his Ph.D. at University of Rochester
in 1995 under the supervision of Dana Ballard. He was then a postdoctoral fellow, working with Sebastian Thrun
and Tom M. Mitchell
at Carnegie Mellon University
.
From 1998 to 2000 he was a Research Scientist and Research Coordinator at Justsystem Pittsburgh Research Center
. From 2000 to 2002 was Vice President of Research and Development at WhizBang Labs, and Director of its Pittsburgh office.
In 2009 he was elected a fellow
of the Association for the Advancement of Artificial Intelligence
.
In collaboration with John Lafferty and Fernando Pereira, he developed Conditional random field
s. In 2011 this research paper won the "Test of Time" (10 year best paper) award at the International Conference on Machine Learning (ICML).
McCallum has written several widely-used open-source software toolkits for machine learning, natural language processing and other text processing, including Rainbow, Mallet (software project)
, and FACTORIE. In addition, he was instrumental in publishing the Enron Corpus
, a large collection of emails that has been used as a basis for a number of academic studies of social networking and language.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
department at University of Massachusetts Amherst
University of Massachusetts Amherst
The University of Massachusetts Amherst is a public research and land-grant university in Amherst, Massachusetts, United States and the flagship of the University of Massachusetts system...
. His primary specialties are in machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
, natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
, information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...
, information integration
Information integration
Information integration is the merging of information from disparate sources with differing conceptual, contextual and typographical representations. It is used in data mining and consolidation of data from unstructured or semi-structured resources...
, and social network analysis.
McCallum graduated summa cum laude from Dartmouth College
Dartmouth College
Dartmouth College is a private, Ivy League university in Hanover, New Hampshire, United States. The institution comprises a liberal arts college, Dartmouth Medical School, Thayer School of Engineering, and the Tuck School of Business, as well as 19 graduate programs in the arts and sciences...
in 1989. He completed his Ph.D. at University of Rochester
University of Rochester
The University of Rochester is a private, nonsectarian, research university in Rochester, New York, United States. The university grants undergraduate and graduate degrees, including doctoral and professional degrees. The university has six schools and various interdisciplinary programs.The...
in 1995 under the supervision of Dana Ballard. He was then a postdoctoral fellow, working with Sebastian Thrun
Sebastian Thrun
Sebastian Thrun is a Research Professor of Computer Science at Stanford University and former director of the Stanford Artificial Intelligence Laboratory . He led the development of the robotic vehicle Stanley which won the 2005 DARPA Grand Challenge, and which is exhibited in the Smithsonian...
and Tom M. Mitchell
Tom M. Mitchell
Tom Michael Mitchell is an American computer scientist and E. Fredkin University Professor at the Carnegie Mellon University . He is currently the Chair of Machine Learning Department at CMU...
at Carnegie Mellon University
Carnegie Mellon University
Carnegie Mellon University is a private research university in Pittsburgh, Pennsylvania, United States....
.
From 1998 to 2000 he was a Research Scientist and Research Coordinator at Justsystem Pittsburgh Research Center
Justsystem Pittsburgh Research Center
Also known as JPRC and Just Research, Justsystem Pittsburgh Research Center was a late-1990's computer science research laboratory in Pittsburgh, loosely associated with Carnegie Mellon University. Its director was Dr...
. From 2000 to 2002 was Vice President of Research and Development at WhizBang Labs, and Director of its Pittsburgh office.
In 2009 he was elected a fellow
Fellow
A fellow in the broadest sense is someone who is an equal or a comrade. The term fellow is also used to describe a person, particularly by those in the upper social classes. It is most often used in an academic context: a fellow is often part of an elite group of learned people who are awarded...
of the Association for the Advancement of Artificial Intelligence
Association for the Advancement of Artificial Intelligence
The Association for the Advancement of Artificial Intelligence or AAAI is an international, nonprofit, scientific society devoted to advancing the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines...
.
In collaboration with John Lafferty and Fernando Pereira, he developed Conditional random field
Conditional random field
A conditional random field is a statistical modelling method often applied in pattern recognition.More specifically it is a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between observations and construct consistent interpretations...
s. In 2011 this research paper won the "Test of Time" (10 year best paper) award at the International Conference on Machine Learning (ICML).
McCallum has written several widely-used open-source software toolkits for machine learning, natural language processing and other text processing, including Rainbow, Mallet (software project)
Mallet (software project)
MALLET is a Java "MAchine Learning for Language Toolkit".-Description:MALLET is an integrated collection of Java code useful for statistical natural language processing, document classification, cluster analysis, information extraction, topic modeling and other machine learning applications to...
, and FACTORIE. In addition, he was instrumental in publishing the Enron Corpus
Enron Corpus
The Enron Corpus is a large database of over 600,000 emails generated by 158 employees of the Enron Corporation and acquired by the Federal Energy Regulatory Commission during its investigation after the company's collapse. A copy of the database was subsequently purchased for $10,000 by Andrew...
, a large collection of emails that has been used as a basis for a number of academic studies of social networking and language.