Bijankhan Corpus
Encyclopedia
The Bijankhan corpus is a tagged corpus
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...

 that is suitable for natural language processing research on the Persian language
Persian language
Persian is an Iranian language within the Indo-Iranian branch of the Indo-European languages. It is primarily spoken in Iran, Afghanistan, Tajikistan and countries which historically came under Persian influence...

. This collection is gathered from daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural, etc; in about 4300 different subject categories. The corpus contains about 2.6 million manually tagged words with a tag set that contains 550 Persian part-of-speech tags
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e...

.

The Bijankhan corpus was created by the Database Research Group at the University of Tehran
University of Tehran
The University of Tehran , also known as Tehran University and UT, is Iran's oldest university. Located in Tehran, the university is among the most prestigious in the country, and is consistently selected as the first choice of many applicants in the annual nationwide entrance exam for top Iranian...

. The corpus is non-free
Free content
Free content, or free information, is any kind of functional work, artwork, or other creative content that meets the definition of a free cultural work...

 in that it is not free for commercial use, although these restrictions vary by country
Iran and copyright issues
According to Circular 38a of the U.S. Copyright Office, Iran has no official copyright relations whatsoever with the United States.Published works originating in Iran thus are not copyrighted in the United States, regardless of the local copyright laws of these countries. See 17 U.S.C. § 104,...

. The Bijankhan corpus is named after Pr. M. Bijankhan from the faculty of Literature & Human Science at the University of Tehran due to his contributions in this area.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK