Scheduling and Debugging Job in Hadoop
Scheduling
Job scheduling means job has to wait until it turns come for execution. In the shared cluster environment a lot of resources are shared among various users. Hence in turn it needs a better scheduler. It is highly required that Production jobs should finish effectively in timely manner. However at the same time should allow to getresults back in a reasonable time for the users who makes smaller ad hoc queries,More info go through hadoop admin course
We can add to set a job’s priority for the execution using mapred.job.priority property, we may use setJobPriority() method. The value that it takes are:
- VERY_HIGH
- HIGH
- NORMAL
- LOW
- VERY_LOW
We have choice of scheduler for MapReduce in Hadoop:
- Fai Scheduler
The purpose of Fair Scheduler is to give every user a fair share of the available cluster capacity over the period of time.
- In case of a single job is running – It will get all of the cluster
- With multiple jobs submitted –Free task slots are provided to the jobs in such a way so that to give each user a fair share of the cluster.
- CapacityScheduler
In Capacity Scheduler, the cluster is made up of a number of queues which may be hierarchical in such a way that a queue may be the child of another queue. Also, each queue has anallocated capacity. Under each queue, jobsare scheduled using First In First Out scheduling with priorities.
Debugging
Debugging can be done in various ways and in various parts of the Hadoop eco system. But if we have to talk about LOGS especially for analyzing the system then we have following types of log available:
System Daemon Logs
Each Hadoop daemon produces a logfile usinglog4j. Written in the directory which is defined by HADOOP_LOG_DIR environmentvariable. It’s been use by the Administrator.
HDFS Audit Logs
A log of all HDFS requests. Written to the NameNode’s log, it is configurable. It’s been use by the Administrator.
MapReduce Job History Logs
A log of the events that takes place during running a job. Saved centrally on the JobTracker. It’s been use by the User.
MapReduce Task Logs
Each TaskTracker child process produces alogfile using log4j which is called syslog. Written in theuserlogs subdirectory. It’s been use by the User.
No comments: