A Transfer Learning Approach for Arabic Image Captions
DOI:
https://doi.org/10.23851/mjs.v35i3.1485Keywords:
CNN, Computer Vision, LSTM, GRU, NLPAbstract
Background: Arabic image captioning (AIC) is the automatic generation of text descriptions in the Arabic language for images. Applies a transfer learning approach in deep learning to enhance computer vision and natural language processing. There are many datasets in English reverse other languages. Instead of, the Arabs researchers unanimously agreed that there is a lack of Arabic databases available in this field. Objective: This paper presents the improvement and processing of the available Arabic textual database using Google spreadsheets for translation and creation of AR. Flicker8k2023 dataset is an extension of the Arabic Flicker8k dataset available, it was uploaded to GitHub and made public for researches. Methods: An efficient model proposed using deep learning techniques by including two pre-training models (VGG16 and VGG19), to extract features from the images and build (LSTM and GRU) models to process textual prediction sequence. In addition to the effect of pre-processing the text in Arabic. Results: The adopted model outperforms better compared to the previous study in BLEU-1 from 33 to 40. Conclusions: This paper concluded that the biggest problem is the database available in the Arabic language. This paper has worked to increase the size of the text database from 24,276 to 32,364 thousand captions, where each image contains 4 captions.
Downloads
References
M. T. Lasheen and N. H. Barakat, "Arabic image captioning: The effect of text pre-processing on the attention weights and the bleu-n scores," International Journal of Advanced Computer Science and Applications, vol. 13, no. 7, pp. 413-423, 2022.
M. Al-Tai, B. M. Nema, and A. . Al-Sherbaz, "Deep learning for fake news detection: Literature review," Al Mustansiriyah Journal of Science, vol. 34, no. 2, pp. 70-81, Jun. 2023.
Z. A. Ramadhan and D. Alzubaydi, "Text detection in natural image by connected component labeling," Al Mustansiriyah Journal of Science, vol. 30, no. 1, p. 111, 2019.
N. M. Khassaf and S. H. Shaker, "Image retrieval based convolutional neural network," Al-Mustansiriyah Journal of Science, vol. 31, no. 4, pp. 43-54, Dec. 2020.
A. Salaberria, G. Azkune, O. L. de Lacalle, A. Soroa, and E. Agirre, "Image captioning for effective use of language models in knowledge-based visual question answering," Expert Systems with Applications, vol. 212, p. 118 669, 2023.
S. M. Sabri, "Arabic image captioning using deep learning with attention," M.S. thesis, Institute for Artificial Intelligence, University of Georgia, 2021.
A. Attai and A. Elnagar, "A survey on arabic image captioning systems using deep learning models," in Proceedings of the 14th International Conference on Innovations in Information Technology (IIT), 2020, pp. 114-119.
T. Ghandi, H. Pourreza, and H. Mahyar, "Deep learning approaches on image captioning: A review," arXiv, 2022.
O. ElJundi, M. Dhaybi, K. Mokadam, H. Hajj, and D. Asmar, "Resources and end-to-end neural network models for arabic image captioning," in Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol. 5, 2020, pp. 233-241.
H. Hejazi and K. Shaalan, "Deep learning for arabic image captioning: A comparative study of main factors and preprocessing recommendations," International Journal of Advanced Computer Science and Applications, vol. 12, no. 11, pp. 37-44, 2021.
H. D. Hejazi, "Arabic image captioning (aic): Utilizing deep learning and main factors comparison and prioritization," M.S. thesis, The British University in Dubai (BUiD), 2022.
H. A. Al-muzaini, T. N. Al-yahya, and H. Benhidour, "Automatic arabic image captioning using rnn-lstm-based language model and cnn," International Journal of Advanced Computer Science and Applications, vol. 9, no. 6, pp. 67-73, 2018.
R. Mualla and J. Alkheir, "Development of an arabic image description system," International Journal of Computer Science Trends and Technology (IJCST), vol. 6, no. 3, pp. 205-213, 2018.
M. T. Lasheen and N. H. Barakat, "Arabic image captioning: The effect of text pre-processing on the attention weights and the bleu-n scores," International Journal of Advanced Computer Science and Applications, vol. 13, no. 7, pp. 413-423, 2022.
J. Emami, P. Nugues, A. Elnagar, and I. Afyouni, "Arabic image captioning using pre-training of deep bidirectional transformers," in Proceedings of the 15th International Conference on Natural Language Generation, 2022, pp. 40-51.
I. Afyouni, I. Azhara, and A. Elnagar, "Aracap: A hybrid deep learning architecture for arabic image captioning," Procedia Computer Science, vol. 189, pp. 382-389, Jan. 2021.
H. Siraj and D. N. Mezaal, Arabic-image-captioning-haneen-siraj-and-dr-narjis-mezaal, https://github.com/Haneensiraj/Arabic-image-captioning-Haneen-Siraj-and-Dr-Narjis-Mezaal.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, 2017.
H. Mubarak, "Build fast and accurate lemmatization for arabic," Language Resources and Evaluation, pp. 1128-1132, May 2018.
K. Darwish and H. Mubarak, "Farasa: A new fast and accurate arabic word segmenter," Language Resources and Evaluation, pp. 1070-1074, Jan. 2016.
M. I. Jordan, "Serial order: A parallel distributed processing approach," in Neural-Network Models of Cognition - Biobehavioral Foundations, 1997, pp. 471-495.
Downloads
Key Dates
Received
Revised
Accepted
Published
Data Availability Statement
Data is available in the article.
Issue
Section
License
Copyright (c) 2024 Haneen Siraj Ibrahim, Narjis Mezaal Shati, AbdulRahman A. Alsewari
This work is licensed under a Creative Commons Attribution 4.0 International License.
(Starting May 5, 2024) Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgement of the work’s authorship and initial publication in this journal.