您好,欢迎访问三七文档
Monday,June22,2009HadoopOperationsManagingPetabyteswithOpenSourceJeffHammerbacherChiefScientistandVicePresidentofProducts,ClouderaJune22,2009Monday,June22,2009MyBackgroundThanksforAsking▪hammer@cloudera.com▪StudiedMathematicsatHarvard▪WorkedasaQuantonWallStreet▪Conceived,built,andledDatateamatFacebook▪Nearly30amazingengineersanddatascientists▪Severalopensourceprojectsandresearchpapers▪FounderofCloudera▪Buildingcost-effectivedatamanagementtoolsfortheworldMonday,June22,2009PresentationOutlineExceedinglyUnlikelytoBeCompleted▪Hadoopoverviewandsampleusecases▪ClouderaandHadoop▪Hadoopprojectmechanics▪Clusterfacilities,hardware,andsystemsoftware▪Installationandconfiguration▪HDFS(mainfocuswithlimitedtime)▪MapReduce▪Clusterlifecycleandmaintenance▪QuestionsanddiscussionMonday,June22,2009PresentationSourcesForFurtherReading▪“Hadoop:TheDefinitiveGuide”▪TomWhite’sbookfromO’Reilly▪Manyfiguresinthispresentationtakenfromthebook▪“HadoopClusterManagement”▪MarcoNicosia’sUSENIX2009presentation▪ClouderablogandGetSatisfactionpage▪Hadoopdocumentation▪MarkMailmailinglistarchives▪HadoopwikiMonday,June22,2009WhatisHadoop?▪ApacheSoftwareFoundationproject,mostlywritteninJava▪InspiredbyGoogleinfrastructure▪Softwareforprogrammingwarehouse-scalecomputers(WSCs)▪Hundredsofproductiondeployments▪Projectstructure▪HadoopDistributedFileSystem(HDFS)▪HadoopMapReduce▪HadoopCommon(formerly“HadoopCore”)▪Othersubprojects▪Avro,HBase,Hive,Pig,ZookeeperMonday,June22,2009AnatomyofaHadoopCluster▪Commodityservers▪1RU,2x4coreCPU,8GBRAM,4x1TBSATA,2x1gENIC▪Typicallyarrangedin2levelarchitecture▪40nodesperrack▪InexpensivetoacquireandmaintainApacheConUS2008CommodityHardwareCluster•!Typicallyin2levelarchitecture–!NodesarecommodityLinuxPCs–!40nodes/rack–!Uplinkfromrackis8gigabit–!Rack-internalis1gigabitall-to-allMonday,June22,2009HDFS▪Poolcommodityserversintoasinglehierarchicalnamespace▪Breakfilesinto128MBblocksandreplicateblocks▪Designedforlargefileswrittenoncebutreadmanytimes▪Twomajordaemons:NameNodeandDataNode▪NameNodemanagesfilesystemmetadata▪DataNodemanagesdatausinglocalfilesystem▪HDFSmanageschecksumming,replication,andcompression▪Throughputscalesnearlylinearlywithnodeclustersize▪AccessfromJava,C,commandline,FUSE,WebDAV,orThrift▪GenerallynotmountedlikeausualfilesystemMonday,June22,2009HDFSHDFSdistributesfileblocksamongservers!#$%&'%(#)**+,%-&.%/#$#%&0%$1%20$13+3&'1%!!#$%'()*+%,-'./0('#$%&&'()*+,%-.$/$,+010&+-2$)0.0&2$3-.4.0-5*$++-%06-#$%&&'7(.02(8,0-%9(+-:4.0-5;&2#79:#79:(.$8+-0&.0&2-6,3-$5&,)0.&/()/&25$0(&);.*$+-,'()*2-5-)0$++4$)%.,2=(=-06-/$(+,2-&/.(3)(/(*$)0'$20.&/06-.0&2$3-()/2$.02,*0,2-(06&,0+&.()3%$0$#$%&&'*2-$0-.456'$13'&/5$*6()-.$)%*&&2%()$0-.&2?$5&)306-5@+,.0-2.*$)8-8,(+0(06()-A'-).(=-*&5',0-2.B/&)-/$(+.;#$%&&'*&)0(),-.0&&'-2$0-06-*+,.0-2(06&,0+&.()3%$0$&2()0-22,'0()3&2?;84.6(/0()3&2?0&06-2-5$()()35$*6()-.()06-*+,.0-2#79:5$)$3-..0&2$3-&)06-*+,.0-28482-$?()3()*&5()3/(+-.()0&'(-*-.;*$++-%C8+&*?.;D$)%.0&2()3-$*6&/06-8+&*?.2-%,)%$)0+4$*2&..06-'&&+&/.-2=-2.B)06-*&55&)*$.-;#79:.0&2-.062--*&5'+-0-*&'(-.&/-$*6/(+-84*&'4()3-$*6'(-*-0&062--%(//-2-)0.-2=-2.E!#$%&'()'*+!,'-./%0$/&.'12&'02345.'6738#'.&%9&%.'#79:6$..-=-2$+,.-/,+/-$0,2-.B)06-=-24.(5'+--A$5'+-.6&);$)40&.-2=-2.*$)/$(+;$)%06--)0(2-/(+-(++.0(++8-$=$(+$8+-#79:)&0(*-.6-)$8+&*?&2$)&%-(.+&.0;$)%*2-$0-.$)-*&'4&/5(..()3%$0$/2&506-2-'+(*$.(0F!GHI!IHF!HFGI!GIFGH#79:7#8*3%90$1301$%+3*+13$&1'%5&:1%;**.51%=#?*0%@#41A**:%#0)%B#**C%#D1%+&*01131)%$1%6'1%*E%01$F*3:'%*E%&01G+10'&D1%4*+6$13'%E*3%5#3.1H'4#51%)#$#%'$*3#.1%#0)%+3*41''&0.I%(/@J%6'1'%$1'1%$140&K61'%$*%'$*31%10$13+3&'1%)#$#I%Monday,June22,2009HadoopMapReduce▪FaulttolerantexecutionlayerandAPIforparalleldataprocessing▪Cantargetmultiplestoragesystems▪Key/valuedatamodel▪Twomajordaemons:JobTrackerandTaskTracker▪Manyclientinterfaces▪Java▪C++▪Streaming▪Pig▪SQL(Hive)Monday,June22,2009MapReduceMapReducepushesworkouttothedata!#$%&'%(#)**+,%-&.%/#$#%&0%$1%20$13+3&'1%!!#$%&'()'*+,--.'.$/0&/'1-%2'-$3'3-'30&',+3+'#$%%&%'()*+%+,-.&./%()*%/0*.()+(+1($+,,-.(/2*()*0+(+0*,&3*2.4$1)4$1)5*((*26*27/24+%1*()+%2*+0&%'0+(+/3*2()*%*(8/2972/4+.&%',*1*%(2+,&:*0.*23*2;+0//64/%&(/2.=/5.0$2&%'**1$(&/%?+%08&,,2*.(+2(8/29,/.(0$*(/%/0*7+&,$2*&7%*1*..+2-;@%7+1(?&7+6+2(&1$,+2%/0*&.2$%%&%'3*2-.,/8,-?+0//68&,,2*.(+2(&(.8/29/%+%/()*2.*23*28&()+1/6-/7()*0+(+;!##$%&'+0//6A.B+6#*0$1*+%0CDE$.*.&46,*?2/5$.((*1)%&F$*./%&%*6*%.&3*1/46$(*2.-.(*4.(/0*,&3*23*2-)&')0+(++3+&,+5&,&(-+%0(/+%+,-:**%/24/$.+4/$%(./7&%7/24+(&/%F$&19,-;+0//6/77*2.*%(*262&.*.+6/8*27$,%*8(//,7/24+%+'&%'5&'0+(+;D/24/2*&%7/24+(&/%?6,*+.*1/%(+1(G,/$0*2++(H&%7/I1,/$0*2+;1/4JKLMNOLPMQLO!RR)((6HSS888;1,/$0*2+;1/4SKPNKP!QP!KQNQ!N(#)**+%$#41'%#)5#0$#.1%*6%(/789%)#$#%)&'$3&:;$&*0%'$3#$1.%$*%+;'%=*34%*;$%$*%#0%0*)1'%&0%#%?@;'$13A%B&'%#@@*='%#0#@'1'%$*%3;0%&0%+#3#@@1@%#0)%1@&&0#$1'%$1%:*$$@101?4'%&+*'1)%:%*0*@&$&?%'$*3#.1%''$1'A%Monday,June22,2009HadoopSubprojects▪Avro▪Cross-languageserializationforRPCandpersistentstorage▪HBase▪TablestorageontopofHDFS,modeledafterGoogle’sBigTable▪Hive▪SQLinterfacetostructureddat
本文标题:Hadoop Operations_ Managing Big Data Clusters Pres
链接地址:https://www.777doc.com/doc-5234762 .html