How do you get started with Hadoop?

 A year ago, I had to start a POC on Hadoop and I had no idea about what Hadoop is.


I would explain the way I started with and which helped others as well.


1. Go through some introductory videos on Hadoop

Its very important to have some high level idea of hadoop before directly starting working on it. These introductory videos will help in understanding the scope of Hadoop and the use cases where it can be applied. There are a lot of resources available online for the same and going through any of the videos will be beneficial.


2. Understanding MapReduce

The second thing which helped me was to understand what Map Reduce is and how it works. It is explained very nicely in this paper: http://static.googleusercontent....

3. Getting started with Cloudera VM

Once you understand the basics of Hadoop, you can download the VM provided by cloudera and starting running some hadoop commands on it. You can download the VM from this link: http://www.cloudera.com/content/...


It would be nice to get familiar with basic Hadoop commands on the VM and understanding how it works.


4. Setting up the standalone/Pseudo distributed Hadoop

I would recommend setting up your own standalone Hadoop on your machine once you are familiar with Hadoop using the VM. The steps for installing are explained very nicely on this blog by Michael G. Noll : Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll


5. Understanding the Hadoop Ecosystem

It would be nice to get familiar with other components in the Hadoop ecosystem like Apache Pig, Hive, Hbase, Flume-NG, Hue etc. All these serve different purposes and having some information on all these will be really helpful in building any product around the hadoop ecosystem. You can install all these easily on your machine and get started with them. Cloudera VM by has most of these installed already.


6. Writing Map Reduce Jobs

Once you are done with steps 1-5, I don't think writing Map Reduce would be a challenge. It is explained thoroughly in The Definitive Guide. If MapReduce really interests you a lot, I would suggest reading this book Mining Massive Datasets by Anand Rajaraman, Jure Leskovec and Jeffrey D. Ullman.


If you are looking for best big data hadoop training institute go through big data online course Blog.

No comments:

Powered by Blogger.