DOCUMENT CLUSTERING USING HADOOPS MAP REDUCE OPERATION

Authors

  • Mr. Vitthal Kumbhar*, Dr. Shyamrao Gumaste Author

Keywords:

Hadoop; K-means clustering; Map Reduce; Text mining; Hierarchical Clustering; DF-IDF

Abstract

Every day internet user’s accesses data from various sources which in the form of text, images, audios and videos. This extraction of the data not limited to these terms, but it expands among vast area of searching things. But to give better services to user, data provider organization are searching technology which mainly focuses on challenging issues like accessing, storing, searching, sharing, transfer and visual presentation of data. Managing distributed unstructured data is impossible with traditional relational database system. Proposed system manages big data which is in the form of text, distributed among different text or pdf document. Paper focused on use of Map Reduce framework as a parallel computing system of Hadoop. System proposes implementation of TF-IDF factor, k-means clustering on Hadoop. Also system proposes hierarchical clustering of documents. System reduces computing time to cluster data using Hadoop as compare to computing system implemented by using simple Java.

Downloads

Published

2015-07-30

Issue

Section

Articles