Hadoop: MapReduce: WordCount: Pseduo Cluster Environment
To list files/folders available under hadoop root directory:
hadoop fs -ls /
Create a file with name xyz.txt:
sudo gedit xyz.txt
Check whether it has created or not(/home/user_name/):
ll i.e., LL
verify the contents of xyz.txt:
cat xyz.txt
Before writing xyz.txt to HDFS, make sure it doesn't exists in the hadoop root directory:
hadoop fs -ls /
Write xyz.txt to HDFS:
hadoop fs -put xyz.txt /xyz.txt
Now, check, whether xyz.txt exists in hadoop root directory or not:
hadoop fs -ls /
To view files/blocks/locations info use:
hadoop fsck /xyz.txt -files -blocks -locations
Goto the folder structure, where your java programs exists:
cd training_materials/developer/exercises/wordcount/
Compile all your java files, by setting the hadoop-core.jar on classpath:
javac -classpath /usr/lib/hadoop/hadoop-core.jar *.java
Create a jar file out of all compiled java classes(i.e., all .class files):
jar cvf wordcount.jar *.class
Now, run/execute the jar by specifying
- the .class file name which has got main()
- the input file/folder name(including path)
- the output file name(including path)
hadoop jar wordcount.jar WordCount /xyz.txt /xyz_output1
Now, go to /xyz_output1 location:
hadoop fs -ls /xyz_output1
Note: you will see: _SUCCESS, _logs, part_00000. Where as, part_00000 has final reducer results/output.
To view the output:
hadoop fs -cat /xyz_output1/part-00000
To save whatever the commands you entered till now:
cd (means now i am at /home/user_name/)
history > command_history.txt
Become a Master in Big data and hadoop at OnlineITGuru through Big Data Hadoop Training
No comments: