Arabic Letter Frequency
Encyclopedia
What gets counted in input Arabic text?
Chiefly, the Arabic alphabet consists of 28 primary letters, these are letters 1 to 28 in Table 1. However, when scripting in Arabic, the eight modified letters listed in positions 29 to 36 in the same table are used just the same. If these 8 modified forms are lumped back into the primary list based on shape or phonetic similarity, the outcome then is as shown in Table 2. For accurate frequency analysis, each of the 36 letters of Table 1 gets its frequency counted independently. The ordering of the alphabet shown in the tables is more logical than is used by the UnicodeUnicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
standard.
Although the full set of Arabic characters includes about ten diacritics as as shown in the Figure 1, frequency analysis of Arabic characters is only concerned with computing the frequency of alphabet letters shown in Table 2.
Sources with over five million letters
The following famous Arabic sources are used to generate an acceptable amount of data on which frequency statistics are conducted.- The first seven volumes of the series البداية والنهاية (The Beginning and The End) of Ibn KathirIbn KathirIsmail ibn Kathir was a Muslim muhaddith, Faqih, historian, and commentator.-Biography:His full name was Abu Al-Fida, 'Imad Ad-Din, Isma'il bin 'Umar bin Kathir, Al-Qurashi, Al-Busrawi...
. All together, these seven volumes fill up 2,855 pages, containing 1,096,047 words, containing 4,326,031 letters. - The book of الرحيق المختوم (The Sealed NectarThe Sealed NectarAr-Raheeq-ul-Makhtum is a book of Sira of the Muslim prophet Muhammad, written in Arabic and Urdu by Safi-ur-Rehman Mubarakpuri...
: the life of Prophet Mohammad PBBU) of Almubarakfuri. The book is spread over 284 pages, containing 134,662 words, containing 553,740 letters. - The book of تحفة العروسين (The Masterpiece of the Brides) for Al-shuri. The book is spread over 239 pages, containing 66,550 words, containing 242,361 letters.
Collectively, these sources add up to 3,378 pages, generating 1,297,259 words, or, 5,122,132 letters. The following two figures show the letter frequency distribution for the counted letters; Figure 2 shows a histogram data sorted on Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
value. Figure 3 shows a histogram data sorted on frequency.
Qur'an letter and word frequency statistics
Using the Qur'anQur'an
The Quran , also transliterated Qur'an, Koran, Alcoran, Qur’ān, Coran, Kuran, and al-Qur’ān, is the central religious text of Islam, which Muslims consider the verbatim word of God . It is regarded widely as the finest piece of literature in the Arabic language...
as data source for Arabic letter frequency generation, the frequency distribution of letters is much in line with what's reported and exhibited in the Figures 2 and 3 above. The following list highlights statistics particular to one of the most common print editions (the recitation of Hafs through Asim) also available online.
- Number of letters is 330,709
- Number of words without repetition is 14,870
- Number of words in the entire Quran is 77,797
- Number of verses is 6,236
- The average word length in the Quran is 330,709 ÷ 77,797 = 4.25
A detailed study of letter and word frequency analysis of the entire book of Qur'an is provided by Intellaren Articles.
External links
- Tools to analyze Arabic text letters and words
- A detailed study of Statistical Distributions of Arabic Text Letters
- New layout Arabic Letter Keyboard IntellarkArabic Letter Keyboard IntellarkIntellark is an Arabic keyboard layout for intuitive typing in Arabic made by the company Intellaren's . 48 Arabic letters shown in figure 1 can be produced with Intellark...
produces all 48 printable Arabic characters intuitively.