Memory-Efficient Probabilistic Neuro-Symbolic Integration for Explainable Natural Language Inference Using Transformer-Based Foundation Models

Zahraa Sameer Ibrahim; Haedar Ahmed Mukhef; Hayder Hasan Ali

doi:10.23851/mjs.v37i2.1871

Authors

Zahraa Sameer Ibrahim Department of Computer Science, College of Basic Education, Mustansiriyah University, Baghdad, Iraq https://orcid.org/0009-0003-0777-2038
Haedar Ahmed Mukhef Department of Computer Science, College of Science, Mustansiriyah University, Baghdad, Iraq https://orcid.org/0009-0006-6373-2884
Hayder Hasan Ali Department of Computer Science, College of Science, Mustansiriyah University, Baghdad, Iraq https://orcid.org/0009-0004-2941-5151

DOI:

https://doi.org/10.23851/mjs.v37i2.1871

Keywords:

Explainable AI, Neuro-Symbolic AI, BERT, Transformer models, Memory optimization, e-SNLI

Abstract

Background: Transformer-based foundation models have achieved state-of-the-art results in various natural language inference benchmarks, but their decision-making processes remain largely unexplainable. Addressing the ’explainability gap’ is crucial for responsible AI adoption in highrisk industries that require transparency and trustworthiness. Furthermore, the combination of neural pattern matching with structured symbolic reasoning in resource-constrained scenarios is an important open problem. Objective: This study aims to present a memory-optimized probabilistic neuro-symbolic hybrid architecture that unifies transformer-based neural networks with logic-based symbolic reasoning systems. Methods: We use the e-SNLI dataset that provides human-written natural language explanations and reasoning highlights as training targets, and finetune the BERT transformer-based language model with an emphasis on gradient checkpointing, mixed-precision (FP16) training, and layer freezing for optimal resource utilization/reasoning tradeoffs. All experiments were performed on an NVIDIA GPU with 8–12 GB VRAM and CUDA-compatible hardware. Results: The proposed framework achieves 80.6% accuracy on 3-way NLI classification (contradiction, entailment, and neutral) with 0.806 precision, recall, and F1 scores on each class, and detailed class-level analysis shows high performance on entailment recognition (F1 = 0.912) and contradiction detection (F1 = 0.902), but slightly lower performance on neutral cases (F1 = 0.864). Ablation studies and confidence distributions of the model predictions indicate that memory-optimized models can maintain competitive performance and be deployed on resource-constrained devices, reducing GPU memory usage by ~60%. Conclusions: The results indicate that neuro-symbolic systems within memory-constrained systems can achieve both explanation needs and foundation models’ performance requirements, representing an important step in creating more trustworthy AI for NLP.

Downloads

Download data is not yet available.

References

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), Association for Computational Linguistics, 2019, pp. 4171-4186.
CrossRef | Google Scholar

I. Perikos and S. Souli, "Natural language inference with transformer ensembles and explainability techniques," Electronics, vol. 13, no. 19, Art no. 3876, 2024.
CrossRef | Google Scholar

O.-M. Camburu, T. Rocktäschel, T. Lukasiewicz, and P. Blunsom, "e-SNLI: natural language inference with natural language explanations," in Proceedings of the 32nd International Conference on Neural Information Processing Systems, ser. NIPS'18, Montréal, Canada: Curran Associates Inc., 2018, pp. 9560-9572.
Google Scholar

L. Li, A. Wang, M. Xu, Y. Dong, and X. Li, "Abductive natural language inference by interactive model with structural loss," Pattern Recognition Letters, vol. 177, pp. 82-88, Jan. 2024.
CrossRef | Google Scholar

P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, et al., "Mixed precision training," 2018.
Google Scholar | Link

V. Hassija, V. Chamola, A. Mahapatra, A. Singal, D. Goel, K. Huang, S. Scardapane, I. Spinelli, M. Mahmud, and A. Hussain, "Interpreting black-box models: a review on explainable artificial intelligence," Cognitive Computation, vol. 16, no. 1, pp. 45-74, 2023.
CrossRef | Google Scholar

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS'17, Long Beach, California, USA: Curran Associates Inc., 2017, pp. 6000-6010.
Google Scholar | Link

Z. Sadeghi, R. Alizadehsani, M. A. CIFCI, S. Kausar, R. Rehman, P. Mahanta, P. K. Bora, A. Almasri, R. S. Alkhawaldeh, S. Hussain, et al., "A review of explainable artificial intelligence in healthcare," Computers and Electrical Engineering, vol. 118, Art no. 109370, Aug. 2024.
CrossRef | Google Scholar

D. Solanki, A. Thakkar, K. Patel, J. Sarda, and A. K. Bhoi, "A review on approaches and applications of natural language inference," in Proceedings of Data Analytics and Management. Springer Nature Singapore, 2025, pp. 441-454.
CrossRef | Google Scholar

V. Chakkarwar, S. Tamane, and A. Thombre, "A review on BERT and its implementation in various NLP tasks," in Proceedings of the International Conference on Applications of Machine Intelligence and Data Analytics (ICAMIDA 2022). Atlantis Press International BV, 2023, pp. 112-121.
CrossRef | Google Scholar

M. Müller, M. Salathé, and P. E. Kummervold, "COVID-Twitter-BERT: a natural language processing model to analyse COVID-19 content on twitter," Frontiers in Artificial Intelligence, vol. 6, Art no. 1023281, Mar. 2023.
CrossRef | Google Scholar | PubMed

D. Yu, B. Yang, D. Liu, H. Wang, and S. Pan, "A survey on neural-symbolic learning systems," Neural Networks, vol. 166, pp. 105-126, Sep. 2023.
CrossRef | Google Scholar | PubMed

A. Fahfouh, A. Benlahbib, J. Riffi, and H. Tairi, "USMBA-NLP at semeval-2024 task 2: safe biomedical natural language inference for clinical trials using BERT," in Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), Association for Computational Linguistics, 2024, pp. 432-436.
CrossRef | Google Scholar

S. Renjit and S. M. Idicula, "A study of the state of the art approaches and datasets for multilingual natural language inference," Neural Processing Letters, vol. 56, no. 6, Art no. 243, 2024.
CrossRef | Google Scholar

R. Wang, Z. Gao, L. Zhang, S. Yue, and Z. Gao, "Empowering large language models to edge intelligence: a survey of edge efficient LLMs and techniques," Computer Science Review, vol. 57, Art no. 100755, Aug. 2025.
CrossRef | Google Scholar

U. Nawaz, M. Anees-ur-Rahaman, and Z. Saeed, "A review of neuro-symbolic AI integrating reasoning and learning for advanced cognitive systems," Intelligent Systems with Applications, vol. 26, Art no. 200541, Jun. 2025.
CrossRef | Google Scholar

Y. Feng, X. Yang, X. Zhu, and M. Greenspan, "Neuro-symbolic natural logic with introspective revision for natural language inference," Transactions of the Association for Computational Linguistics, vol. 10, pp. 240-256, 2022.
CrossRef | Google Scholar

T. Chandrasekaran, S. Ramisetty, and M. R. Pulicharla, "Neurosymbolic AI: bridging neural networks and symbolic reasoning," World Journal of Advanced Research and Reviews, vol. 25, no. 1, pp. 2351-2373, 2025.
CrossRef | Google Scholar

R. Niu, Q. Wang, H. Kong, Q. Xing, Y. Chang, and P. S. Yu, "Learn to explain transformer via interpretation path by reinforcement learning," Neural Networks, vol. 188, Art no. 107496, Aug. 2025.
CrossRef | Google Scholar | PubMed

P. Li, H. Yu, W. Zhang, G. Xu, and X. Sun, "SA-NLI: a supervised attention based framework for natural language inference," Neurocomputing, vol. 407, pp. 72-82, Sep. 2020.
CrossRef | Google Scholar

N. M. Gardazi, A. Daud, M. K. Malik, A. Bukhari, T. Alsahfi, and B. Alshemaimri, "BERT applications in natural language processing: a review," Artificial Intelligence Review, vol. 58, no. 6, Art no. 166, 2025.
CrossRef | Google Scholar

J. Kabbara and J. Cheung, "Investigating the effect of pre-finetuning BERT models on NLI involving presuppositions," in Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics, 2023, pp. 10482-10494.
CrossRef | Google Scholar

I. M. S. Putra, D. Siahaan, and A. Saikhu, "SNLI indo: a recognizing textual entailment dataset in Indonesian derived from the Stanford natural language inference dataset," Data in Brief, vol. 52, Art no. 109998, Feb. 2024.
CrossRef | Google Scholar | PubMed

M. Yeung, E. Sala, C.-B. Schönlieb, and L. Rundo, "Unified focal loss: generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation," Computerized Medical Imaging and Graphics, vol. 95, Art no. 102026, Jan. 2022.
CrossRef | Google Scholar | PubMed

T. Dao, S. Ermon, D. Fu, C. Ré, and A. Rudra, "FlashAttention: fast and memory-efficient exact attention with IO-awareness," in Advances in Neural Information Processing Systems 35, ser. NeurIPS 2022, Neural Information Processing Systems Foundation, Inc. (NeurIPS), 2022, pp. 16344-16359.
CrossRef | Google Scholar