Hadoop testing applications using MRUnit and other automation tools

 There is a special code architecture for Hadoop MapReduce jobs that follows a particular template with specific constructs. When doing test-driven development (TDD) and writing unit tests, this architecture poses interesting problems. Using MRUnit, Mockito, and PowerMock, this is a real-world example. I'm going to touch on

1) Use MRUnit for Hadoop MR applications to write JUnit checks.

2) To mock static methods using PowerMock & Mockito.

3) Mocking-out business logic found in another class.

4) Checking that mocked-out business logic.

5) Testing counters, 

6) log4j conditional block testing statements.and 

7) Handling testing exceptions. I'm sure the reader is familiar with JUnit 4 already.

You can craft test input with MRUnit, push it through your mapper and/or reducer, and check that it's all output in a JUnit test. This helps you to debug the code using the JUnit test as an engine, just as other JUnit tests do. A map/reduce pair can be evaluated using the MapReduceDriver from MRUnit. A combinator can also be evaluated using MapReduceDriver. A PipelineMapReduceDriver helps you to test the job mapping/reduction workflow. At present, partitioners under MRUnit do not have a test drive. MRUnit helps you to do TDD and write light-weight unit tests that accommodate Hadoop's complex architecture and constructs,More info go through big data hadoop course




Example in MRUnit

We're processing road surface data used to construct maps in the following example. Linear surfaces (describing a stretch of road) and intersections (describing a road intersection) are included in the input. A collection of these mixed surfaces is taken as input by this mapper and discards all that is not a linear road surface. Then processes and writes to HDFS each road surface. We want to keep track of how many non-road surfaces are added and finally print them out. Besides, we will print out how many road surfaces were processed for debugging purposes.

public class MergeAndSplineMapper extends Mapper<LongWritable, BytesWritable, LongWritable, BytesWritable> {

private static Logger LOG = Logger.getLogger(MergeAndSplineMapper.class);

enum SurfaceCounters {

        ROADS, NONLINEAR, UNKNOWN

 }

        @Override

public void map(LongWritable key, BytesWritable value, Context context) throws IOException, InterruptedException {

          LinkSurfaceMap lsm = (LinkSurfaceMap) BytesConverter.bytesToObject(value.getBytes());

        List<RoadSurface> mixedSurfaces = lsm.toSurfaceList();

          for (RoadSurface surface : mixedSurfaces)  {

                  Long surfaceId = surface.getNumericId();

                  Enums.SurfaceType surfaceType = surface.getSurfaceType();

              if ( surfaceType.equals(SurfaceType.INTERSECTION)  )  {

                            context.getCounter(SurfaceCounters.NONLINEARS).increment(1);

                            continue;

                  }

                  else if ( ! surfaceType.equals(SurfaceType.ROAD) ) {

                            context.getCounter(SurfaceCounters.UNKNOWN).increment(1);

                            continue;

                  }

                  PopulatorPreprocessor.processLinearSurface(surface);

                  lsm.setSurface(surface);

                  context.write(new LongWritable(surfaceId), new BytesWritable(BytesConverter.objectToBytes(lsm)));

                  if (LOG.isDebugEnabled()) {

                            context.getCounter(SurfaceCounters.ROADS).increment(1);

                  }

          }

}

}

Breaking Down in MRUnit

If you look back at our test class, we just inspect the surface ID and form of surface, discard everything that is not a road surface, increase those counters, and process road surfaces. Let's take a look at testMap INTERSECTION), (the first test.

The INTERSECTION testMap

Our goal is to validate

  • TheSurfaceCounters.

  • WITH NONLINEAR.

  • PopulatorPreprocessor.processLinearSurface(surface) is never renamed, i.e., the for-loop continues.

There are no increments.

As this is a mapper, we begin by defining and initializing a driver for the mapper. Note that the four type-parameters specified for the MapDriver, i.e. MergeAndSplineMapper, must fit our test class.

LongWritable, BytesWritable, LongWritable, BytesWritable > mapDriver; private MapDriver;

From @Before

Setup) ({public void setUp)

MergeAndSplineMapper mapper = new version of MergeAndSplineMapper);

New MapDriver < LongWritable, BytesWritable, LongWritable, BytesWritable >);; (mapDriver=

Mapper(mapper) mapDriver.setMapper;

}

Throwing an IOException on the signature of the unit test system in MRUnit

The mapper was able to throw an IOException. You can handle exceptions thrown by the caller code in JUnit tests by catching or throwing them. Bear in mind that we do not test exceptions explicitly. I prefer not to capture the exception and have it thrown away by the unit test process. The test will fail if the unit test method encounters an exception. Which will be what we want. When you are not explicitly checking exception handling, attempting to catch exceptions in unit tests can lead to needless clutter, reasoning, maintenance when you can throw the exception to fail the test.

The @Test

IOException {throws public void testMap INTERSECTION)

To drive the test, initialize the test input. We have to guarantee that the surface form is RoadType to reach the if-block that we want to test.

INTERSECTION By

LinkSurfaceMap lsm = newly developed LinkSurfaceMap);

RoadSurface rs = new(Enums. RoadType. INTERSECTION) RoadSurface;

Byte] [lsmBytes = append(lsm, rs); (lsm, rs);

For mocking out a static call to the PopulatorPreprocessor class, we use PowerMock[3]. A separate class containing business logic is PopulatorPreprocessor and is evaluated by its JUnit test. We set-up PowerMock at the class level with the

With @RunWith

Annotation and tell it which classes to mock; PopulatorPreprocessor, in this case, one. With with

Checking @PrepareForTest in MRUnit

We tell PowerMock which classes that we want to mock have static methods. PowerMock supports both EasyMock and Mockito, so you'll see references to PowerMockito because we're using Mockito. By naming PowerMockito.mockStatic, we ridicule the static class.

(PowerMockRunner.class) @RunWith

(PopulatorPreprocessor.class) @PrepareForTest

PowerMockito.mockStatic(Preprocessor.classPopulator);

Set the test input generated previously and run the mapper:

Current BytesWritable(lsmBytes); mapDriver.withInput(new LongWritable(1234567));

(); mapDriver.runTest();

Verify the performance. TheSurfaceCounters.

WITH NONLINEARS

Once, and SurfaceCounters are incremented.

STREATS

SurfaceCounters and.

NON-KNOWN

There are no increments. A fast review-the assertion error message is the first parameter, an optional String with JUnit's assertEquals. The estimated value is the second parameter and the real value is the third parameter. AssertEquals prints a nice "expected: < x > but was: < y >" error message, so if the second statement was to be shot, for example, we would get the error message "java.lang. AssertionError: NONLINEARS count incorrect." Estimated:<1 > but was:<0 >.

Assert.assertEquals("ROADS counts as wrong., "0,0, 0 , 0 , 0, assertEquals

MapDriver.getCounters().roads.getValue()).findCounter(SurfaceCounters. ROADS);;

Assert.assertEquals("NONLINEARS count incorrect., "1, 1 , 1," NONLINEARS count incorrect.

MapDriver.getCounters().findCounter(NONLINEARS.SurfaceCounters).getValue));;);

Assert.assertEquals("Unknown count incorrect., "0 , 0," Unknown count incorrect

MapDriver.getCounters().findCounter(.UNKNOWN).getValue));;;);

Verify that, by using the following PowerMock / Mockito syntax, PopulatorPreprocessor.processLinearSurface(surface) was not renamed.

PowerMockito.verifyStatic(Mockito.never));;);

PreprocessorPopulator.processLinearSurface(rs);

Route testMap ROAD

TestMap ROAD). (in our second test. Checking is our objective:

TheSurfaceCounters.

STREATS

Increased is.

This is called the PopulatorPreprocessor.processLinearSurface(surface).

TheSurfaceCounters.

WITH NONLINEARS in MRUnit

SurfaceCounters. UNKNOWN and will not be increased.

With a couple of exceptions, the configuration is similar to the first evaluation.

1. Specifying the type of road in our input results.

RoadSurface rs = new (Enums. RoadType. ROAD) RoadSurface;

2. Setting a debug level of log4j.

Interestingly, we want to count road surfaces in our source code only when the debug level in the log4j logger is set. We first save the original logging level to test this, then we recover the Root logger and set the level to

DEBUG ONLY

OriginalLevel level = Logger.getRootLogger().getLevel);;);

Logger.getRootLogger().setLevel(Level. DEBUG) (Level. DEBUG)

We return to the original logging level at the end of the test to avoid impacting other tests.

Logger.getRootLogger().setLevel(originalLevel);(originalLevel);)

Let's check the performance once again. TheSurfaceCounters. Once, ROADS and SurfaceCounters are incremented. SurfaceCounters and NONLINEARS.

NON-KNOWN

There are no increments.

Assert.assertEquals("ROADS counts as wrong., "1, 1 , 1, 1 , 1, assertEquals

MapDriver.getCounters().roads.getValue()).findCounter(SurfaceCounters. ROADS);;

Assert.assertEquals("NONLINEARS count incorrect., "0, 0, 0," NONLINEARS count incorrect.

MapDriver.getCounters().findCounter(NONLINEARS.SurfaceCounters).getValue));;);

Assert.assertEquals("Unknown count incorrect., "0 , 0," Unknown count incorrect

MapDriver.getCounters().findCounter(.UNKNOWN).getValue));;;);

Verify that, using the following PowerMock / Mockito syntax, PopulatorPreprocessor.processLinearSurface(surface) was named once.

(Mockito.times(1)); PowerMockito.verifyStatic;

PreprocessorPopulator.processLinearSurface(rs);

A REDUCER checking in MRUnit

As in the testing of a mapper, the same rules will apply. The difference is that, as shown below, we would like to build a ReducerDriver and populate it with our reducer class under study.

DecreaseDriver; private ReduceDriver < LongWritable, BytesWritable, LongWritable, BytesWritable >

From @Before

Setup) ({public void setUp)

Reducer for MyReducer = new MyReducer);

ReduceDriver = new ReduceDriver < LongWritable, BytesWritable, BytesWritable, LongWritable >);

Driver.setReducer(reducer) reduction;

}

Dependencies at MAVEN Pom in MRUnit

Besides JUnit 4, in your maven pom.xml, you will have to add the following dependencies. Take note of the assisted versions of Mockito on the PowerMock web page.

< dependencies >

< groupId > org.apache.mrunit</groupId > < groupId>-

< artifactId > mrunit</artifactId > < artifactId>-

< version>0.8.0-incubating</version>-incubating

< scope > trial</scope >

< /dependencies >

< dependencies >

< groupId > org.mockito</groupId > Org.mockito

< artifactId > mockito-all</artifactId> mockito-all

< release>1.9.0-rc1</release >

< scope > trial</scope >

< /dependencies >

< dependencies >

< groupId > org.powermock</groupId > org.powermock

< artifactId > powermock-module-junit4</artifactId > junit4</artifactId >

< release>1.4.12</version >

< scope > trial</scope >

< /dependencies >

< dependencies >

< groupId > org.powermock</groupId > org.powermock

< artifactId > powermock-api-mockito</artifactId > Powermock-api-mockito

< release>1.4.12</version >

< scope > trial</scope >

< /dependencies >

Eclipse Running In MRUnit

Just like every other JUnit test will be performed, the test is run. An example of the test running within Eclipse is here.

Conclusion

To do test-driven development, MRUnit offers a powerful and lightweight approach. A good side effect is that it helps to shift you to better than previously possible code coverage. You can learn more through Hadoop admin online course

No comments:

Powered by Blogger.