Hadoop Pseudo-Distributed

Content

Configuration

Use the following:

conf/core-site.xml:

              fs.default.name          hdfs://localhost:9000     

conf/hdfs-site.xml:

              dfs.replication          1     

conf/mapred-site.xml:

              mapred.job.tracker          localhost:9001     

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P ‘’ -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub » ~/.ssh/authorized_keys

Execution

Format a new distributed-filesystem:

$ bin/hadoop namenode -format

Start the hadoop daemons:

$ bin/start-all.sh

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

NameNode - http://localhost:50070/

JobTracker - http://localhost:50030/

Copy the input files into the distributed filesystem:

$ bin/hadoop fs -put conf input

Run some of the examples provided:

$ bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:

$ bin/hadoop fs -get output output $ cat output/*

or

View the output files on the distributed filesystem:

$ bin/hadoop fs -cat output/*

When you’re done, stop the daemons with:

$ bin/stop-all.sh

Error

When I exe “bin/start-all.sh”, there’s an error “localhost:Error:JAVA_HOME is not set.”

start-all.sh  

This script is Deprecated. Instead use start-dfs.sh and start-mapred.sh   starting namenode, logging to /home/chenwq/hadoop/hadoop-0.21.0/bin/../logs/hadoop-root-namenode-ubuntu.out  

localhost: Error: JAVA_HOME is not set.   localhost: Error: JAVA_HOME is not set.   starting jobtracker, logging to /home/chenwq/hadoop/hadoop-0.21.0/bin/../logs/hadoop-root-jobtracker-ubuntu.out  

localhost: Error: JAVA_HOME is not set. 

Need to modify conf/hadoop-env.sh

Ref

Pseudo-Distributed Operation localhost: Error: JAVA_HOME is not set.

Log

2012-07-27 Create

Written on July 27, 2012