学习hadoop第二步 MapReduce任务的基础知识

2MapReduce任务的基础知识2.1HadoopMapReduce作业的基本构成要素.............................................................................12.1.1输入分割块..................................................................................................................52.1.2一个简单的Map任务：IdentityMapper..................................................................62.2配置作业................................................................................................................................102.2.1指定输入格式.............................................................................................................232.2.2设置输出参数.............................................................................................................252.2.3配置Reduce阶段......................................................................................................312.3执行作业................................................................................................................................332.4创建客户化的Mapper和Reducer.......................................................................................36这一章，我们将整体的介绍MapReduce作业。读完本章，你能编写和执行单机模式的MapReduce作业程序。本章中的样例程序假设你已经完成了第一章的设置。你可以在一个专用的本地模式配置下，使用一台单机执行这些样例程序，你不需要启动Hadoop核心框架。对于调试和单元测试，单机模式配置是最理想的。你能够从Apress网站()上这本书所在的页面下载这些样例代码。这些可下载的代码也包含一个用来执行样例程序的JAR文件。下面我们就开始查看MapReduce作业的必要组成要素。2.1HadoopMapReduce作业的基本构成要素用户可以配置和向框架提交MapReduce任务（简言之，作业）。一个MapReduce作业包括Map任务，混淆过程，排序过程和一套Reduce任务。然后框架会管理作业的分配和执行，收集输出和向用户传递作业结果。用户负责处理作业初始化，指定输入位置，指定输入和确保输入格式和位置是正确无误的。框架负责在集群中TaskTracker节点上派遣作业，执行map过程，混淆过程，排序过程和Reduce过程，把输出写入输出目录，最后通知用户作业完成状态。本章的所有样例程序都基于文件MapReduceIntro.java，如列表2-1所示。文件MapReduceIntro.java的代码所创建的作业逐行的读取输入，然后，根据每一行第一个Tab字符前面的部分排序这些行，如果某一行没有Tab字符，框架会根据整个行进行排序。MapReduceIntro.java文件是一个简单的实现了配置和执行MapReduce作业的样例程序。列表2-1MapReduceIntro.javapackagecom.apress.hadoopbook.examples.ch2;importjava.io.IOException;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapred.FileInputFormat;importorg.apache.hadoop.mapred.FileOutputFormat;importorg.apache.hadoop.mapred.JobClient;importorg.apache.hadoop.mapred.JobConf;importorg.apache.hadoop.mapred.KeyValueTextInputFormat;importorg.apache.hadoop.mapred.RunningJob;importorg.apache.hadoop.mapred.lib.IdentityMapper;importorg.apache.hadoop.mapred.lib.IdentityReducer;importorg.apache.log4j.Logger;/***AverysimpleMapReduceexamplethatreadstextualinputwhereeachrecordis*asingleline,andsortsalloftheinputlinesintoasingleoutputfile.**TherecordsareparsedintoKeyandValueusingthefirstTABcharacterasa*separator.IfthereisnoTABcharactertheentirelineistheKey.***@authorJasonVenner**/publicclassMapReduceIntro{protectedstaticLoggerlogger=Logger.getLogger(MapReduceIntro.class);/***ConfigureandruntheMapReduceIntrojob.**@paramargs*Notused.*/publicstaticvoidmain(finalString[]args){try{/***Constructthejobconfobjectthatwillbeusedtosubmitthis*jobtotheHadoopframework.ensurethatthejarordirectory*thatcontainsMapReduceIntroConfig.classismadeavailabletoall*oftheTasktrackernodesthatwillrunmapsorreducesforthis*job.*/finalJobConfconf=newJobConf(MapReduceIntro.class);/***Takecareofsomehousekeepingtoensurethatthissimpleexample*jobwillrun*/MapReduceIntroConfig.exampleHouseKeeping(conf,MapReduceIntroConfig.getInputDirectory(),MapReduceIntroConfig.getOutputDirectory());/***Thissectionistheactualjobconfigurationportion/***ConfiguretheinputDirectoryandthetypeofinput.Inthiscase*wearestatingthattheinputistext,andeachrecordisa*singleline,andthefirstTABistheseparatorbetweenthekey*andthevalueoftherecord.*/conf.setInputFormat(KeyValueTextInputFormat.class);FileInputFormat.setInputPaths(conf,MapReduceIntroConfig.getInputDirectory());/***Informtheframeworkthatthemapperclasswillbethe*{@linkIdentityMapper}.ThisclasssimplypassestheinputKey*Valuepairsdirectlytoitsoutput,whichinourcasewillbethe*shuffle.*/conf.setMapperClass(IdentityMapper.class);/***Configuretheoutputofthejobtogototheoutputdirectory.*InformtheframeworkthattheOutputKeyandValueclasseswill*be{@linkText}andtheoutputfileformatwill*{@linkTextOutputFormat}.TheTextOutputformatclassjoins*producesarecordofoutputforeachKey,Valuepair,withthe*followingformat.Formatter.format(%s\t%s%n,key.toString(),*value.toString());.**Inadditionindicatetotheframeworkthattherewillbe1*reduce.Thisresultsinallinputkeysbeingplacedintothe*same,single,partition,andthefinaloutputbeingasingle*sortedfile.*/FileOutputFormat.setOutputPath(conf,MapReduceIntroConfig.getOutputDirectory());conf.setOutputKeyClass(Text.class);conf.setOutputValueClass(Text.class);conf.setNumReduceTasks(1);/***Informtheframeworkthatthereducerclasswillbethe*{@linkIdentityReducer}.Thisclasssimplywritesanoutput*recordkey,valuerecordforeachvalueinthekey,valuesetit*receivesasinput.Thevalueorderingisarbitrary.*/conf.setReducerClass(IdentityReducer.class);logger.info(Launchingthejob.);/***Sendthejobconfigurationtotheframeworkandrequestthatthe*jobberun.*/finalRunningJobjob=JobClient.runJob(conf);logger.info(Thejobhascompleted.);if(!job.isSuccessful()){logger.error(Thejobfailed.);System.exit

学习hadoop第二步 MapReduce任务的基础知识

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

B2C大型电商对比分析--以京东、兰亭集势和亚马逊为

基于数据仓库的决策支持系统应用模式探讨

长春鹏和地产皓月大路项目定位方案_110PPT_XXXX年

第5章汽车运动_体育运动_生活休闲

第4章保险合同(924)

工程项目现场管理规程

项目各部门管理办法与岗位职责

C3-1项目塔吊司机、信号工培训资料

pmp培训 pmp证书 pmp认证考试 pmp培训机构 pmp项目管理 pmp

关于对全市职业学校XXXX届毕业生进行职业资格(或岗位资格

相关文档

相关搜索