Text Summarizing and Clustering Using Data Mining Technique
Keywords:Information Systems, Texts Summary, Large Data, Learning Machine, K-, TF-IDF
Text summarization is an important research topic in the field of information technology because of the large volume of texts, and the large amount of data found on the Internet and social media. The task of summarizing the text has gained great importance that requires finding highly efficient ways in the process of extracting knowledge in various fields, Thus, there was a need for methods of summarizing texts for one document or multiple documents. The summarization methods aim to obtain the main content of the set of documents at the same time to reduce redundant information. In this paper, an efficient method to summarize texts is proposed that depends on the word association algorithm to separate and merge sentences after summarizing them. As well as the use of data mining technology in the process of redistributing information according to the (K-Mean) algorithm and the use of (Term Frequency Inverse Document Frequency TF-IDF) technology for measuring the properties of summarized texts. The experimental results found that the summarization ratios are good by deleting unimportant words. Also, the method of extracting characteristics for texts was useful in grouping similar texts into clusters, which makes this method possible to be combined with other methods in artificial intelligence such as fuzzy logic or evolutionary algorithms in increasing summarization rates and accelerating cluster operations.
Mocnik, Franz-Benjamin. "Putting geographical information science in place-towards theories of platial information and platial information systems." Progress in Human Geography (2022): 03091325221074023.
Zhang, Rui, Cairang Jia, and Jian Wang. "Text emotion classification system based on multifractal methods." Chaos, Solitons & Fractals 156 (2022): 111867.
Salloum, Said A., et al. "Using text mining techniques for extracting information from research articles." Intelligent natural language processing: Trends and Applications. Springer, Cham, 2018. 373-397.
El-Kassas, Wafaa S., et al. "Automatic text summarization: A comprehensive survey." Expert Systems with Applications 165 (2021): 113679.
Wang, Danqing, et al. "Heterogeneous graph neural networks for extractive document summarization." arXiv preprint arXiv:2004.12393 (2020).
Jung, Chihoon, et al. "Knowledge Base Driven Automatic Text Summarization using Multi-objective Optimization." INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS 12.8 (2021): 836-849.
Memon, Muhammad Qasim, et al. "An ensemble clustering approach for topic discovery using implicit text segmentation." Journal of Information Science 47.4 (2021): 431-457.
Goularte, Fábio Bif, et al. "A text summarization method based on fuzzy rules and applicable to automated assessment." Expert Systems with Applications 115 (2019): 264-275.
Sanchez-Gomez, Jesus M., Miguel A. Vega-Rodríguez, and Carlos J. Pérez. "Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach." Knowledge-Based Systems 159 (2018): 1-8.
Alguliyev, Rasim M., et al. "COSUM: Text summarization based on clustering and optimization." Expert Systems 36.1 (2019): e12340.
How to Cite
Copyright (c) 2023 Al-Mustansiriyah Journal of Science
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles accepted for publication in Al-Mustansiriyah Journal of Science (MJS) are protected under the Creative Commons Attribution 4.0 International License (CC-BY-NC). Authors of accepted articles are requested to sign a copyright release form prior to their article being published. All authors must agree to the submission, sign copyright release forms, and agree to be included in any correspondence between MJS and the authors before submitting a work to MJS. For personal or educational use, permission is given without charge to print or create digital copies of all or portions of a MJS article. However, copies must not be produced or distributed for monetary gain. It is necessary to respect the copyright of any parts of this work that are not owned by MJS.