Best Approximate of Vector Space Model by Using SVD
Keywords:High Dimensional Datasets, Dimensionality reduction, SVD, Vector Space Model.
AbstractA quick growth of internet technology makes it easy to assemble a huge volume of data as text document; e. g., journals, blogs, network pages, articles, email letters. In text mining application, increasing text space of datasets represent excessive task which makes it hard to pre-processing documents in efficient way to prepare it for text mining application like document clustering. The proposed system focuses on pre-processing document and reduction document space technique to prepare it for clustering technique. The mutual method for text mining problematic is vector space model (VSM), each term represent a features. Thus the proposed system create vector-space mod-el by using pre-processing method to reduce of trivial data from dataset. While the hug dimen-sionality of VSM is resolved by using low-rank SVD. Experiment results show that the proposed system give better document representation results about 10% from previous approach to prepare it for document clustering
H. Froud, A. Lachkar and S. A. Ouatik, "Arabic text summarization based on latent semantic analysis to enhance arabic docu-ments clustering," Journal of university sidi mohamed ben abdellah, Morocco, 2012. DOI: https://doi.org/10.5121/ijdkp.2013.3107
N. S. Pathak, P. P. Rajurkar and A. G. Bhor, "effective approach towards exporter IR system through comparision of various pre-processing techniques," International con-ference on advances in engineering science and management, vol.8, 2015.
N. A. Samat, M. A. Azmi and M. T. Abdul-lah, "Malay documents clustering algorithm based on singular value decomposition," Faculty of computer science and infor-mation technology, university of Putra Ma-laysia, vol.3, 2016.
M. W. Berry, Z. Drma and E. R. Jessuo, "Matrices vector spaces and information retrieval," website www. amazon.com, 2012.
S. Lappin and C. Fox, "Vector space models of lexical meaning," Stephen clark universi-ty of cambridge computer laboratory, vol.25th, 2014.
S. Shama and L. Padmalatha, "Performance comarison of image fusion using singular value decomposition," International journal of innovative research in science, Engineer-ing and technology, vol.4, no.9, 2015. DOI: https://doi.org/10.15680/IJIRSET.2015.0409010
D. Munkova, M. Munk and M. Vozar, "Da-ta pre processing evalution for text mining: Transaction/Sequence Model," international conference on computational Science, 2013. DOI: https://doi.org/10.1016/j.procs.2013.05.286
S. Vijayarani and J. Ilamathi, "Prepro-cessing Techniques for text mining an over-view," International journal of computer science and communication networks, vol.5, 2015.
C. Ramasubramanian, R. Ramya and V. Tamilnadu, "Effective preprocessing activi-ties in text mining using improved porters stemming algorithm," international journal of adanced research in computer and com-munication engineering, vol.2, no.12, 2013.
N. P. Katariya, S. Chaudhari and N. P. Ka-tariya, "Text preprocessing for text mining using side information," international jour-nal of computer science and mobile applica-tion, vol.3, no.1, 2015.
How to Cite
Articles accepted for publication in Al-Mustansiriyah Journal of Science (MJS) are protected under the Creative Commons Attribution 4.0 International License (CC BY-NC). Authors of accepted articles are requested to sign a copyright release form prior to their article being published. All authors must agree to the submission, sign copyright release forms, and agree to be included in any correspondence between MJS and the authors before submitting a work to MJS. For personal or educational use, permission is given without charge to print or create digital copies of all or portions of a MJS article. However, copies must not be produced or distributed for monetary gain. It is necessary to respect the copyright of any parts of this work that are not owned by MJS.