Arabic Stemmer System based on Rules of Roots
Abstract
Stemmer is an automated process, which produces a base string in an attempt to represent related words, which is the main step that is used to process data in many types of applications such as text mining, information retrieval, and natural language processing. The stemmer task is to reduce words to their base. The more systems are used to analyse and understand the syntax and semantic of the documents the more accurate is the result. Arabic stemmer is not an easy task due to the morphological variants of certain words which are not always semantically related. This paper introduces an Arabic stemmer system based on Arabic rules to extract trilateral (three radicals), quadrilateral (four radicals), sometimes quintuple (five radicals) and hexagonal (six radicals) if available. In addition, it compares the Arabic stemmer with other stemmer systems, and evaluates it by four Arabic native speakers specialists where it has achieved 96.8% ratio of accuracy.Downloads
Published
2019-04-28
Issue
Section
Articles
License
Authors submitting articles to the IJITLS warrent that the work is original and that she/he is the author of the submission together with the named co-authors; to the extend the submission incorporates text passages, figures, data or other material from the work of others, the submitting author has obtained any necessary permission.
Articles in this journal are published under the Creative Commons Attribution Licence (CC-BY 4.0).
By submitting an article, the author grants to this journal the non-exclusive right to publish it. The author retains the copyright and the publishing rights for his article without any restrictions.