Steven DeRose
Encyclopedia
Steven J DeRose is a computer scientist with a significant history of contributions to Computational Linguistics
and to key standards related to document processing
, mostly around ISO
's Standard Generalized Markup Language (SGML)
and W3C
's Extensible Markup Language (XML)
.
His contributions include the following:
While serving as Chief Scientist of the Brown University
Scholarly Technology Group, he received NSF
and NEH grants and contributed heavily to the Open eBook
and Encoded Archival Description
standards. Previously, he was co-founder and Chief Scientist at Electronic Book Technologies, Inc., where he designed the first SGML browser (Dynatext
), which earned 11 US Patents and won multiple Seybold and other awards.
His 1987 article with James Coombs and Allen Renear, "Markup Systems and the Future of Scholarly Text Processing", is a seminal source for the theory of markup systems, and has been widely cited and reprinted. In addition, he has published 2 books (Making Hypermedia Work: A User's Guide to HyTime and The SGML FAQ Book); many articles, including several at Balisage
and predecessor conferences series; and keynote addresses at the ACM
Conference on Very Large DataBases (VLDB
), and a talk at the Text Encoding Initiative
.
In Computational Linguistics, he is known for pioneering the use of dynamic programming
methods for part-of-speech tagging
(DeRose 1988, 1990).
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....
and to key standards related to document processing
Document processing
Document Processing involves the conversion of typed and handwritten text on paper-based & electronic documents into electronic information utilising one of, or a combination of, Intelligent Character Recognition , Optical Character Recognition and experienced Data Entry Clerks....
, mostly around ISO
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
's Standard Generalized Markup Language (SGML)
Standard Generalized Markup Language
The Standard Generalized Markup Language is an ISO-standard technology for defining generalized markup languages for documents...
and W3C
World Wide Web Consortium
The World Wide Web Consortium is the main international standards organization for the World Wide Web .Founded and headed by Tim Berners-Lee, the consortium is made up of member organizations which maintain full-time staff for the purpose of working together in the development of standards for the...
's Extensible Markup Language (XML)
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
.
His contributions include the following:
- HyTimeHyTimeHyTime is a markup language that is an "application" of SGML. HyTime defines a set of hypertext-oriented element types that, in effect, supplement SGML and allow SGML document authors to build hypertext and multimedia presentations in a standardized way.HyTime is an international standard...
- Text Encoding InitiativeText Encoding InitiativeThe Text Encoding Initiative is a text-centric community of practice in the academic field of digital humanities. The community runs a mailing list, meetings and conference series, and maintains a technical standard, a wiki and a toolset....
- XPathXPathXPath is a language for selecting nodes from an XML document. In addition, XPath may be used to compute values from the content of an XML document...
– editor - XPointerXPointerXPointer is a system for addressing components of XML based internet media.XPointer is divided among four specifications: a "framework" which forms the basis for identifying XML fragments, a positional element addressing scheme, a scheme for namespaces, and a scheme for XPath-based addressing...
– editor - XLinkXLinkXML Linking Language, or XLink, is an XML markup language and W3C specification that provides methods for creating internal and external links within XML documents, and associating metadata with those links.-The XLink specification:...
– editor - OSISOpen Scripture Information StandardOpen Scripture Information Standard is an XML application , that defines tags for marking up Bibles, theological commentaries, and other related literature....
-- chairman - XMLXMLExtensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
While serving as Chief Scientist of the Brown University
Brown University
Brown University is a private, Ivy League university located in Providence, Rhode Island, United States. Founded in 1764 prior to American independence from the British Empire as the College in the English Colony of Rhode Island and Providence Plantations early in the reign of King George III ,...
Scholarly Technology Group, he received NSF
National Science Foundation
The National Science Foundation is a United States government agency that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National Institutes of Health...
and NEH grants and contributed heavily to the Open eBook
Open eBook
Open eBook , or formally, the Open eBook Publication Structure , is a legacy e-book format "based primarily on technology developed by SoftBook Press" and on XML; it has been superseded by the EPUB electronic publication standard.Open eBook is a ZIP file plus a Manifest file...
and Encoded Archival Description
Encoded Archival Description
Encoded Archival Description is an XML standard for encoding archival finding aids, maintained by the Library of Congress in partnership with the Society of American Archivists.-History:EAD originated in 1993, at the University of California, Berkeley...
standards. Previously, he was co-founder and Chief Scientist at Electronic Book Technologies, Inc., where he designed the first SGML browser (Dynatext
Dynatext
DynaText is an SGML publishing tool. It was introduced in 1990, and was the first system to handle arbitrarily large SGML documents, and to render them according to multiple style-sheets that could be switched at will....
), which earned 11 US Patents and won multiple Seybold and other awards.
His 1987 article with James Coombs and Allen Renear, "Markup Systems and the Future of Scholarly Text Processing", is a seminal source for the theory of markup systems, and has been widely cited and reprinted. In addition, he has published 2 books (Making Hypermedia Work: A User's Guide to HyTime and The SGML FAQ Book); many articles, including several at Balisage
Balisage
Balisage is, most commonly in military applications, the use of dim lighting to enable navigation while not giving away one's position to the enemy....
and predecessor conferences series; and keynote addresses at the ACM
Association for Computing Machinery
The Association for Computing Machinery is a learned society for computing. It was founded in 1947 as the world's first scientific and educational computing society. Its membership is more than 92,000 as of 2009...
Conference on Very Large DataBases (VLDB
VLDB
VLDB is an annual conference held by the non-profit Very Large Data Base Endowment Inc.. The mission of VLDB is to promote and exchange scholarly work in databases and related fields throughout the world...
), and a talk at the Text Encoding Initiative
Text Encoding Initiative
The Text Encoding Initiative is a text-centric community of practice in the academic field of digital humanities. The community runs a mailing list, meetings and conference series, and maintains a technical standard, a wiki and a toolset....
.
In Computational Linguistics, he is known for pioneering the use of dynamic programming
Dynamic programming
In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...
methods for part-of-speech tagging
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e...
(DeRose 1988, 1990).
Selected publications
- DeRose, Steven J. 1988. "Grammatical category disambiguation by statistical optimization." Computational Linguistics 14(1): 31–39.
- DeRose, Steven J. 1990. "Stochastic Methods for Resolution of Grammatical Category Ambiguity in Inflected and Uninflected Languages." Ph.D. Dissertation. Providence, RI: Brown University Department of Cognitive and Linguistic Sciences.
- DeRose, Steven J. and David G. Durand. 1994. Making Hypermedia Work: A User's Guide to HyTime. Kluwer Academic Publishers. ISBN 13: 9780792394327.
- DeRose, Steven J. 1997. The SGML FAQ Book. Kluwer Academic Publishers. ISBN 13: 9780792399438.