Enron Corpus
Encyclopedia
The Enron Corpus is a large database of over 600,000 emails generated by 158 employees of the Enron Corporation and acquired by the Federal Energy Regulatory Commission
during its investigation after the company's collapse. A copy of the database was subsequently purchased for $10,000 by Andrew McCallum
, a computer scientist at the University of Massachusetts
. He released this copy to researchers, providing a trove of data that has been used for studies on social networking and computer analysis of language. The corpus is "unique" in that it is one of the only publicly available mass collections of "real" emails easily available for study, as such collections are typically bound by numerous privacy and legal restrictions which render them prohibitively difficult to access.
Federal Energy Regulatory Commission
The Federal Energy Regulatory Commission is the United States federal agency with jurisdiction over interstate electricity sales, wholesale electric rates, hydroelectric licensing, natural gas pricing, and oil pipeline rates...
during its investigation after the company's collapse. A copy of the database was subsequently purchased for $10,000 by Andrew McCallum
Andrew McCallum
Andrew McCallum is a professor and researcher in the computer science department at University of Massachusetts Amherst. His primary specialties are in machine learning, natural language processing, information extraction, information integration, and social network analysis.McCallum graduated...
, a computer scientist at the University of Massachusetts
University of Massachusetts
This article relates to the statewide university system. For the flagship campus often referred to as "UMass", see University of Massachusetts Amherst...
. He released this copy to researchers, providing a trove of data that has been used for studies on social networking and computer analysis of language. The corpus is "unique" in that it is one of the only publicly available mass collections of "real" emails easily available for study, as such collections are typically bound by numerous privacy and legal restrictions which render them prohibitively difficult to access.