Research Trends in the Fields of Arabic Natural Language Processing Tasks and Arabic Information Extraction Applications: A Survey Study
Abstract
This survey has explored the literature on the fields of Arabic NLP tasks and Arabic IE applications to analyze the state-of-the-art trends, identify the research gaps in these research fields, and recommend solutions to fulfill these gaps. This study is set out to gather appropriate research articles in the targeted fields from Academic Search Engines and Academic Databases. Subsequently, these articles were surveyed to obtain information about research trends aspects. That is, the contributions achieved, the methodologies applied, and the technical and linguistic resources utilized. This review study has followed systematic review procedure steps to meet the requirements of high-quality survey studies. The collected and reviewed articles cover different research contributions. For instance, the Morphological resolution in the field of Arabic NLP tasks and the Sentiment Analysis (SA) applications in the field of Arabic IE applications. The findings of this study can be summarized into that most of the researchers in the field of Arabic NLP tasks prefer to contribute to NER and then to the Morphological resolution tasks; however, in the field of Arabic IE they prefer to contribute to SA applications and then to the Question and Answering applications. Secondly, most of the reviewed articles applied methodologies, tools, techniques, and algorithms, not for specific languages such as Machine Learning, Artificial Neural Networks, and Deep Learning Algorithms. Lastly, this study provides the first comprehensive assessment which examines associations between the dataset sources domain types and dataset sources ownership types in addition to the relation between articles’ contribution fields and the datasets ownership types. It confirms that the highest-reviewed articles numbers in the field of Arabic NLP tasks are for those that utilize existing and available dataset sources; specifically, in Linguistic domain dataset sources. Nonetheless, the highest reviewed articles numbers in the field of Arabic IE applications are for those whose authors are collecting and creating the dataset sources by themselves; also, in Linguistic domain dataset sources.References
Ababou, N., Mazroui, A., & Belehbib, R. (2017). Parsing Arabic Nominal sentences using context free grammar and fundamental rules of classical grammar. International Journal of Intelligent Systems and Applications, 9(8), 11–24. https://doi.org/10.5815/ijisa.2017.08.02
Abdullah, M., AlMasawa, M., Makki, I., Alsolmi, M., & Mahrous, S. (2018). Emotions extraction from Arabic tweets. International Journal of Computers and Applications, 42(7), 661–675. https://doi.org/10.1080/1206212X.2018.1482395
Abo, M. E. M., Raj, R. G., Qazi, A., & Zakari, A. (2019). Sentiment Analysis for Arabic in Social Media Network: A Systematic Mapping Study. ArXiv Preprint, ArXiv ID: 1911.05483.
Abolohom, A., & Omar, N. (2017). A Computational Model for Resolving Arabic Anaphora using Linguistic Criteria. Indian Journal of Science and Technology. Publisher: Indian Society for Education and Environment., 10(3), 1–6. https://doi.org/10.17485/ijst/2017/v10i3/110637
Abumalloh, R. A., AlSerhan, H. M., BinIbrahim, O., & AbuUlbeh, W. (2018). Arabic Part-of-Speech Tagger, an Approach Based on Neural Network Modelling. International Journal of Engineering & Technology. Publisher: Science Publishing Corporation, 7(2.29), 742. https://doi.org/10.14419/ijet.v7i2.29.14009
Al-Ayyoub, M., Khamaiseh, A. A., Jararweh, Y., & Al-Kabi, M. N. (2019). A comprehensive survey of arabic sentiment analysis. Information Processing and Management. Pergamon, 56(2), 320–342. https://doi.org/https://doi.org/10.1016/j.ipm.2018.07.006
AL-Shenak, M., Nahar, K. M. O., & Halawani, K. M. H. (2019). Aqas: Arabic question answering system based on svm, svd, and lsi. Journal of Theoretical and Applied Information Technology. Little Lion Scientific, 97(2), 681–691. https://doi.org/ISSN: 1992-8645
Al-Smadi, M., Al-Dalabih, I., Jararweh, Y., & Juola, P. (2019). Leveraging Linked Open Data to Automatically Answer Arabic Questions. IEEE Access, 7(March), 177122–177136. https://doi.org/10.1109/ACCESS.2019.2956233
Al-Smadi, M., Al-Zboon, S., Jararweh, Y., & Juola, P. (2020). Transfer Learning for Arabic Named Entity Recognition with Deep Neural Networks. IEEE Access, 8, 37736–37745. https://doi.org/10.1109/ACCESS.2020.2973319
Alalyani, N., & Marie-Sainte, S. L. (2018). NADA: New Arabic dataset for text classification. International Journal of Advanced Computer Science and Applications. Publisher: The Science and Information (SAI) Organization, 9(9), 206–212. https://doi.org/10.14569/ijacsa.2018.090928
Alam, T. M., & Awan, M. J. (2018). Domain Analysis of Information Extraction Techniques. INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY SCIENCES AND ENGINEERING, 9(6), 1–9.
Albarghothi, A., Khater, F., & Shaalan, K. (2017). Arabic Question Answering Using Ontology. Procedia Computer Science, 117, 183–191. https://doi.org/10.1016/j.procs.2017.10.108
Ali, Mohammed N.A., Tan, G., & Hussain, A. (2018). Bidirectional recurrent neural network approach for arabic named entity recognition. Future Internet, 10(12), 1–12. https://doi.org/10.3390/fi10120123
Ali, Mohammed Nadher Abdo, Tan, G., & Hussain, A. (2019). Boosting Arabic Named-Entity Recognition with Multi-Attention Layer. IEEE Access, 7, 46575–46582. https://doi.org/10.1109/ACCESS.2019.2909641
Alian, M., Awajan, A., & Al-kouz, A. (2017). Arabic Word Sense Disambiguation - Survey. International Conference on New Trends in Computing Sciences (ICTCS), 11-13 October 2017, November 2019. https://doi.org/10.1109/ICTCS.2017.23
Aljameel, S. S., Alabbad, D. A., Alzahrani, N. A., Alqarni, S. M., Alamoudi, F. A., Babili, L. M., Aljaafary, S. K., & Alshamrani, F. M. (2021). A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent covid-19 outbreaks in Saudi Arabia. International Journal of Environmental Research and Public Health. Publisher: Multidisciplinary Digital Publishing Institute (MDPI), 18(1), 1–12. https://doi.org/10.3390/ijerph18010218
Aljamel, A., Osman, T., Acampora, G., Vitiello, A., & Zhang, Z. (2019). Smart Information Retrieval: Domain Knowledge Centric Optimization Approach. IEEE Access, 7(Ml), 4167–4183. https://doi.org/10.1109/ACCESS.2018.2885640
Almarimi, A. A., & Enbiah, E. M. (2020). Recognition System for Libyan Entity Names. European Journal of Electrical Engineering and Computer Science, 4(6), 1–5. https://doi.org/10.24018/ejece.2020.4.6.263
Almuhareb, A., Alsanie, W., & Al-Thubaity, A. (2019). Arabic Word Segmentation With Long Short-Term Memory Neural Networks and Word Embedding. IEEE Access, 7, 12879–12887. https://doi.org/10.1109/ACCESS.2019.2893460
Alnaied, A., Elbendak, M., & Bulbul, A. (2020). An intelligent use of stemmer and morphology analysis for Arabic information retrieval. Egyptian Informatics Journal, 21(4), 209–217. https://doi.org/10.1016/j.eij.2020.02.004
Alqrainy, S., & Alawairdhi, M. (2021). Towards developing a comprehensive tag set for the Arabic language. Journal of Intelligent Systems, 30(1), 287–296. https://doi.org/10.1515/jisys-2019-0256
Alsafari, S., Sadaoui, S., & Mouhoub, M. (2020). Hate and offensive speech detection on Arabic social media. Online Social Networks and Media, 19(September), Article 100096. https://doi.org/10.1016/j.osnem.2020.100096
Alshammari, N., & Alanazi, S. (2020). An Arabic dataset for disease named entity recognition with multi-annotation schemes. Data. Publisher: Multidisciplinary Digital Publishing Institute (MDPI), 5(3), 1–8. https://doi.org/10.3390/data5030060
Alswaidan, N., & Menai, M. (2020). Hybrid Feature Model for Emotion Recognition in Arabic Text. IEEE Access, 8, 37843–37854. https://doi.org/10.1109/ACCESS.2020.2975906
ASBAYOU, O. (2020). Automatic Arabic Named Entity Extraction and Classification for Information Retrieval. International Journal on Natural Language Computing, 9(6), 1–22. https://doi.org/10.5121/ijnlc.2020.9601
Azman, B. (2019). Root Identification Tool for Arabic Verbs. IEEE Access, 7, 45866–45871. https://doi.org/10.1109/ACCESS.2019.2908177
Azmi, A. M., Al-qabbany, A. O., & Hussain, A. (2019). Computational and natural language processing based studies of hadith literature : a survey. Artificial Intelligence Review, 52(2), 1369–1414. https://doi.org/10.1007/s10462-019-09692-w
Bakari, W., & Neji, M. (2020). A novel semantic and logical ‑ based approach integrating RTE technique in the Arabic question – answering. International Journal of Speech Technology. https://doi.org/10.1007/s10772-020-09684-0
Ben-Othman, M. T., Al-Hagery, M. A., & El-Hashemi, Y. M. (2020). Arabic Text Processing Model: Verbs Roots and Conjugation Automation. IEEE Access, 8, 103913–103923. https://doi.org/10.1109/ACCESS.2020.2999259
Boudchiche, M., & Mazroui, A. (2019). A hybrid approach for Arabic lemmatization. International Journal of Speech Technology, 22(3), 563–573. https://doi.org/10.1007/s10772-018-9528-3
Chowdhury, G. (2003). Natural Language Processing. In The Annual Review of Information Science and Technology (Vol. 37). https://doi.org/ISSN 0066-4200
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research, 12, 2461–2505.
Daoud, D. M., & El-Seoud, M. S. A. (2017). Employing information extraction for building mobile applications. International Journal of Interactive Mobile Technologies, 11(2), 99–112. https://doi.org/10.3991/ijim.v11i2.6569
El Bazi, I., & Laachfoubi, N. (2018). Arabic Named Entity Recognition using topic modeling. International Journal of Intelligent Engineering and Systems, 11(1), 229–238. https://doi.org/10.22266/ijies2018.0228.24
Eldin, S. S., Mohammed, A., Eldin, A. S., & Hefny, H. (2020). An enhanced opinion retrieval approach via implicit feature identification. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-020-00622-9
Farghaly, A., & Shaalan, K. (2009). Arabic Natural Language Processing: Challenges and Solutions. ACM Transactions on Asian Language Information Processing, 8(4), 1–19. https://doi.org/10.1145/1644879.1644881
Fasha, M., Obeid, N., & Hammo, B. (2017). A Proposed Model for Extracting Information from Arabic-Based Controlled Text Domains. Proceedings of the New Trends in Information Technology (NTIT), 25-27 April 2017, 86–92.
Ghembaza, M. I. E., Aloufi, K. S., & Smai, A. (2018). Arabic Solid-Stems for an Efficient Morphological Analysis. Arabian Journal for Science and Engineering, 43(12), 7373–7383. https://doi.org/10.1007/s13369-017-2938-8
Ghoniem, R. M., Alhelwa, N., & Shaalan, K. (2019). A novel hybrid genetic-whale optimization model for ontology learning from Arabic text. Algorithms. Publisher: Multidisciplinary Digital Publishing Institute (MDPI), 12(9), 1–32. https://doi.org/10.3390/a12090182
Guellil, I., Adeel, A., Azouaou, F., Chennoufi, S., Maafi, H., & Hamitouche, T. (2020). Detecting hate speech against politicians in Arabic community on social media. International Journal of Web Information Systems. Emerald Publishing, 16(3), 295–313. https://doi.org/10.1108/IJWIS-08-2019-0036
Gusenbauer, M., & Haddaway, N. R. (2020). Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Research Synthesis Methods. Wiley Online Library, 11(2), 181–217. https://doi.org/10.1002/jrsm.1378
Hamza, A., En-Nahnahi, N., Zidani, K. A., & El Alaoui Ouatik, S. (2021). An arabic question classification method based on new taxonomy and continuous distributed representation of words. Journal of King Saud University - Computer and Information Sciences, 33(2), 218–224. https://doi.org/10.1016/j.jksuci.2019.01.001
Karaa, W., & Slimani, T. (2017). A new approach for arabic named entity recognition. International Arab Journal of Information Technology, 14(3), 332–338.
Khalatia, M. M., & Al-Romanyb, T. A. H. (2020). Artificial Intelligence Development and Challenges ( Arabic Language as a Model ). International Journal of Innovation, Creativity and Change, 13(5), 916–926.
Khalil, H., & Osman, T. (2014). Challenges in information retrieval from unstructured arabic data. Proceedings - UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, UKSim 2014, 456–461. https://doi.org/10.1109/UKSim.2014.115
Khalil, H., Osman, T., & Miltan, M. (2020). Extracting Arabic Composite Names Using Genitive Principles of Arabic Grammar. ACM Transactions on Asian and Low-Resource Language Information Processing, 19(4), 1–16. https://doi.org/10.1145/3382187
Maloney, J., & Niv, M. (1998). TAGARAB: A Fast, Accurate Arabic Name Recogniser Using High Precision Morphological Analysis. Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, 8–15.
Mannai, M., Karâa, W. B. A., & Ghezala, H. H. Ben. (2018). Information extraction approaches: A survey. In D. K. Mishra, A. T. Azar, & A. Joshi (Eds.), Information and Communication Technology. Advances in Intelligent Systems and Computing (Vol. 625, pp. 289–297). Springer, Singapore. https://doi.org/10.1007/978-981-10-5508-9_28
Mansour, M. A. (2013). The Absence of Arabic Corpus Linguistics: A Call for Creating an Arabic National Corpus. International Journal of Humanities and Social Science, 3(12), 81–90.
Marie-sainte, S. L., Alalyani, N., Alotaibi, S., Ghouzali, S., & Abunadi, I. (2019). Arabic Natural Language Processing and Machine Learning-Based Systems. IEEE Access, 7, 7011–7020. https://doi.org/10.1109/ACCESS.2018.2890076
Miswar, Suhardi, & Kurniawan, N. B. (2018). A Systematic Literature Review on Survey Data Collection System. International Conference on Information Technology Systems and Innovation (ICITSI), 22-26 Oct. 2018, 177–181. https://doi.org/10.1109/ICITSI.2018.8696036
Mohamed, E. H., & Shokry, E. M. (2020). QSST: A Quranic Semantic Search Tool based on word embedding. Journal of King Saud University - Computer and Information Sciences, xx(xx), xx. https://doi.org/10.1016/j.jksuci.2020.01.004
Mohamed, S., Hussien, M., & Mousa, H. M. (2021). ADPBC: Arabic Dependency Parsing Based Corpora for Information Extraction. International Journal of Modern Education and Computer Science (IJMECS). Publisher: Modern Education and Computer Science (MECS) Press, 13(1), 54–61. https://doi.org/10.5815/ijitcs.2021.01.04
Muhammad, M., Rohaim, M., Hamouda, A., & Abdel-Mageid, S. (2020). A comparison between conditional random field and structured support vector machine for Arabic named entity recognition. Journal of Computer Science, 16(1), 117–125. https://doi.org/10.3844/jcssp.2020.117.125
Nadkarni, P. M., Ohno-machado, L., & Chapman, W. W. (2011). Natural language processing : an introduction. Journal of the American Medical Informatics Association, 18(5), 544–551. https://doi.org/10.1136/amiajnl-2011-000464
Najeeb, M. M. A. (2020). A novel hadith processing approach based on genetic algorithms. IEEE Access, 8, 20233–20244. https://doi.org/10.1109/ACCESS.2020.2968417
Obeid, O., Zalmout, N., Khalifa, S., Taji, D., Oudah, M., Alhafni, B., Inoue, G., Eryani, F., Erdmann, A., & Habash, N. (2020). CAMeL tools: An open source python toolkit for arabic natural language processing. LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings. 13-14-15 May 2020, 7022–7032.
Omar, N., & Al-Tashi, Q. (2018). Arabic nested noun compound extraction based on linguistic features and statistical measures. GEMA Online Journal of Language Studies. Publisher: Universiti Kebangsaan Malaysia Press, 18(2), 93–107. https://doi.org/10.17576/gema-2018-1802-07
Ombabi, A. H., Ouarda, W., & Alimi, A. M. (2020). Deep learning CNN – LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Social Network Analysis and Mining, 10(Article number: 53), 1–13. https://doi.org/10.1007/s13278-020-00668-1
Paré, G., & Kitsiou, S. (2016). Methods for Literature Reviews. In F. L. and C. Kuziemsky (Ed.), Handbook of eHealth Evaluation: An Evidence-based Approach (pp. 157–179). University of Victoria.
Pare, G., Trudel, M., Jaana, M., & Kitsiou, S. (2015). Synthesizing information systems knowledge: A typology of literature reviews. Information & Management. Elsevier, 52, 183–199. https://doi.org/http://dx.doi.org/10.1016/j.im.2014.08.008
Saadi, A., & Belhadef, H. (2020). Deep neural networks for Arabic information extraction. Smart and Sustainable Built Environment, Emerald Publishing, 9(4), 467–482. https://doi.org/10.1108/SASBE-03-2019-0031
Salloum, S. A., AlHamad, A. Q., Al-Emran, M., & Shaalan, K. (2018). A Survey of Arabic Text Mining. In Studies in Computational Intelligence (pp. 417–431). Springer International Publishing. https://doi.org/10.1007/978-3-319-67056-0_20
Sarhan, I., El-Sonbaty, Y., & El-Nasr, M. A. (2016). Arabic Relation Extraction : A Survey. International Journal of Computer and Information Technology, 05(05), 430–437.
Schubert, L. (2019). Computational Linguistics. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy, SEP (Spr2019ed.). Stanford University. https://plato.stanford.edu/archives/spr2019/entries/computational-linguistics/
Shaalan, K., Siddiqui, S., Alkhatib, M., & Monem, A. A. (2018). Challenges in Arabic Natural Language Processing. In N. El Gayar & C. Y. Suen (Eds.), Computational Linguistics, Speech and Image Processing for Arabic Language (pp. 59–83, Chapter 3). World Scientific Publishing. https://doi.org/10.1142/9789813229396_0003
Soudani, N., Bounhas, I., & Slimani, Y. (2019). MOSSA: a morpho-semantic knowledge extraction system for Arabic information retrieval. International Journal of Knowledge and Web Intelligence. Inderscience Publisher, 6(2), 106–141. https://doi.org/10.1504/ijkwi.2019.103622
Taghizadeh, N., Faili, H., & Maleki, J. (2018). Cross-Language Learning for Arabic Relation Extraction. Procedia Computer Science, 142, 190–197. https://doi.org/10.1016/j.procs.2018.10.475
Thalji, N., Hanin, N. A., Al-Hakeem, S., Hani, W. B., & Thalji, Z. (2018). A novel rule-based root extraction algorithm for Arabic language. International Journal of Advanced Computer Science and Applications. Publisher: Science and Information Organization, 9(10), 120–128. https://doi.org/10.14569/IJACSA.2018.091015
Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y., Mehrabi, S., Sohn, S., & Liu, H. (2018). Clinical information extraction applications: A literature review. Journal of Biomedical Informatics, 77(November 2017), 34–49. https://doi.org/10.1016/j.jbi.2017.11.011
Zakria, G., Farouk, M., Fathy, K., & Makar, M. N. (2019). Relation Extraction from Arabic Wikipedia. Indian Journal of Science and Technology, 12(46), 01–06. https://doi.org/10.17485/ijst/2019/v12i46/147512
Zerrouki, T. (2020). Towards An Open Platform For Arabic Language Processing. Degree of Doctor of Science, Thesis, National School of Computer Science (ESI), Algiers.
Downloads
Published
Issue
Section
License
Authors submitting articles to the IJITLS warrent that the work is original and that she/he is the author of the submission together with the named co-authors; to the extend the submission incorporates text passages, figures, data or other material from the work of others, the submitting author has obtained any necessary permission.
Articles in this journal are published under the Creative Commons Attribution Licence (CC-BY 4.0).
By submitting an article, the author grants to this journal the non-exclusive right to publish it. The author retains the copyright and the publishing rights for his article without any restrictions.