By Kashyap Santoki
In today’s information-saturated world, the huge growth of geographically distributed data necessitates a system that facilitates fast parsing for the retrieval of meaningful results. A searchable index for distributed data would go a long way toward speeding the process. This article demonstrates using Lucene and Java for basic data indexing and searching, using a RAM directory for indexing and searching, creating an index on the data residing in HDF, and searching those indexes. The development environment consists of Java 1.6, Eclipse 3.4.2, Lucene 2.4.0, and Hadoop 0.19.1 running on Microsoft Windows XP SP3.
The Apache Hadoop Project develops open-source software for reliable, scalable, distributed computing, and the Hadoop Distributed File System (HDFS) is designed for storing and sharing files across wide area networks. HDFS is built to run on commodity hardware and provides fault tolerance, resource management, and most importantly, high throughput access to application data.



0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.
You must log in to post a comment.