Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 6 of 6
Running the sample application in standalone mode will prove that things are working properly, but it isn't really very exciting. To really demonstrate the power of Hadoop, you'll want to execute it in real cluster mode.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>MACH1:8000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>MACH2:8000</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.secondary.info.port</name>
<value>8001</value>
</property>
<property>
<name>dfs.info.port</name>
<value>8002</value>
</property>
<property>
<name>mapred.job.tracker.info.port</name>
<value>8003</value>
</property>
<property>
<name>tasktracker.http.port</name>
<value>8004</value>
</property>
</configuration>
MACH1
MACH2
MACH3
MACH4
ssh MACH1
cd /hadoop0.17.1/
bin/start-dfs.sh
stop-dfs.sh command.)
ssh MACH2
cd /hadoop0.17.1/
bin/start-mapred.sh
stop-mapred.sh command.)
You can now execute the EchoOche application as described in the previous section, in the same way. The difference is that now the program will be executed across a cluster of DataNodes. You can confirm this by going to the Web interface provided with Hadoop. Point your browser to http://localhost:8002. (The default is actually port 50070; to see why you'd need to use port 8002 here, take a closer look at Listing 9.) You should see a frame similar to the one in Figure 2, showing the details of NameNode and all jobs managed by it.
This Web interface will provide many details to browse through, showing you the full statistics of your application. Hadoop comes with several different Web interfaces by default; you can see their default URLs in Hadoop-default.xml. For example, in this sample application, http://localhost:8003 will show you JobTracker statistics. (The default is port is 50030.)
In this article, we've presented the fundamentals of MapReduce programming with the open source Hadoop framework. This excellent framework accelerates the processing of large amounts of data through distributed processes, delivering very fast responses. It can be adopted and customized to meet various development requirements and can be scaled by increasing the number of nodes available for processing. The extensibility and simplicity of the framework are the key differentiators that make it a promising tool for data processing.
Ravi Shankar is an assistant vice president of technology development, currently working in the financial industry. He is a Sun Certified Programmer and Sun Certified Enterprise Architect with 15 years of industry experience. He has been a presenter at international conferences like JavaOne 2004, IIWAS 2004, and IMECS 2006. Ravi served earlier as a technical member of the OASIS Framework for Web Services Implementation Committee. He spends most of his leisure time exploring new technologies.
Govindu Narendra is a technical architect pursuing development of parallel processing technologies and portal development in data warehousing. He is a Sun Certified Programmer.
Read more about Enterprise Java in JavaWorld's Enterprise Java section.
More