Hadoop: MapReduce: WordCount: Pseduo Cluster Environment

 To list files/folders available under hadoop root directory:

hadoop fs -ls /


Create a file with name xyz.txt:

sudo gedit xyz.txt


Check whether it has created or not(/home/user_name/):

ll   i.e., LL


verify the contents of xyz.txt:

cat xyz.txt


Before writing xyz.txt to HDFS, make sure it doesn't exists in the hadoop root directory:

hadoop fs -ls /


Write xyz.txt to HDFS:

hadoop fs -put xyz.txt /xyz.txt


Now, check, whether xyz.txt exists in hadoop root directory or not:

hadoop fs -ls /


To view files/blocks/locations info use:

hadoop fsck /xyz.txt -files -blocks -locations


Goto the folder structure, where your java programs exists:

cd training_materials/developer/exercises/wordcount/


Compile all your java files, by setting the hadoop-core.jar on classpath:

javac -classpath /usr/lib/hadoop/hadoop-core.jar *.java


Create a jar file out of all compiled java classes(i.e., all .class files):

jar cvf wordcount.jar *.class


Now, run/execute the jar by specifying


  1. the .class file name which has got main()
  2. the input file/folder name(including path)
  3. the output file name(including path)

hadoop jar wordcount.jar WordCount /xyz.txt /xyz_output1


Now, go to /xyz_output1 location:

hadoop fs -ls /xyz_output1

Note: you will see: _SUCCESS, _logs, part_00000. Where as, part_00000 has final reducer results/output.


To view the output:

hadoop fs -cat /xyz_output1/part-00000


To save whatever the commands you entered till now:

cd (means now i am at /home/user_name/)

history > command_history.txt


Become a Master in Big data and hadoop at OnlineITGuru through Big Data Hadoop Training

No comments:

Powered by Blogger.