您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 经营企划 > Apache-Hadoop-Ecosystem十年发展的革新之路
1©Cloudera,Inc.Allrightsreserved.ApacheHadoopat10:TheEvolutionandFutureoftheHadoopEcosystem3©Cloudera,Inc.Allrightsreserved.MailinglistmessagessentbyToddLipcon-EarlyuserofHadoop-JoinedClouderaasSoftwareEngineer-FoundedtheKuduprojectwithinCloudera-Secretlydevelopingwithasmallteamfor3yearsIntroduction(theevolutionandfutureofme)KuduannouncedandcontributedtotheASFasApacheKudu(incubating)-WorkonHDFS,HBase,MR(HA,performance,stability,etc)-Becameacommitter,PMCmember,andASFMemberSpokeatChinaHadoopSummit’134©Cloudera,Inc.Allrightsreserved.生日快乐!ApacheHadoop:thelast10years5©Cloudera,Inc.Allrightsreserved.6©Cloudera,Inc.Allrightsreserved.Pre-historicHadoop(1999-2005)7©Cloudera,Inc.Allrightsreserved.TheOriginalInspirationsforHadoop200320048©Cloudera,Inc.Allrightsreserved.-Security-Performance-Fastfull-featuredSQL9©Cloudera,Inc.Allrightsreserved.ParquetSentrySparkTezKafkaFlumeEvolutionoftheHadoopPlatform2006200820092010201120122013SqoopFlumeBigtopOozieMRUnitHCatalogHueSqoopWhirrAvroHiveMahoutHBaseZooKeeperSolrPigYARNCoreHadoopSparkTezImpalaKafkaDrillFlumeWhirrWhirrThestackiscontinuallyevolvingandgrowing!2007IbisFlinkParquetSentrySparkTezImpalaKafkaDrillFlume2014-15ImpalaEnterpriseDrillBigtopBigtopBigtopOozieOozieOozieMRUnitMRUnitMRUnitHCatalogHCatalogHCatalogHueHueHueSqoopSqoopSqoop-NewworkloadsWhirrAvroAvroAvroHiveHiveHiveMahoutMahoutMahoutHBaseHBaseHBaseZooKeeperZooKeeperZooKeeperSolrSolrSolrPigPigPigYARNYARNYARNCoreHadoopCoreHadoopCoreHadoopBasics-VerybasicHadoop-Batchprocessesonly-Notstable,fast,orfeaturefulCoreHadoopSolr(HDFS,PigMapReduce)CoreHadoop-Expandingfeatureset-Basicsecurity,HA,stability-CommercialdistributionsWhirrAvroHiveHiveMahoutMahoutHBaseHBaseHBaseZooKeeperZooKeeperZooKeeperSolrSolrSolrPigPigPigCoreHadoopCoreHadoopCoreHadoopProduction10©Cloudera,Inc.Allrightsreserved.EvolutionofHadoop(Basics/2006-2007)•HDFSandMapReduceonly•PartofApacheLucene/Nutch(forbuildingsearchindex)•Supportbasicbatchworkloads.Nohighavailability.•Performancenotimportant•Bottleneckedonsearch-relatedprocessing•Batchonly•Earlyadopters(Facebook,Yahoo,etc)11©Cloudera,Inc.Allrightsreserved.EvolutionofHadoop(Production/2008-2011)•2008-ApacheHadoopbecomesatop-levelproject•Nolongerjustforsearchindexing•ETL,basicSQL,recommendations/personalization,etc.•CommercializationoftheApacheHadoopecosystem•ClouderafoundedinAugust2008,othervendorsfollow•“Hadoop:theDefinitiveGuide”published•FirstHadoopWorldconferenceinNYC•Expandingtotraditionalenterprises12©Cloudera,Inc.Allrightsreserved.EvolutionofHadoop(Production/2008-2011)•HDFSevolvestoaddhighavailabilityandsecurity•Stillfocusedonbatchworkloads(ETL,indexbuilding,offlineanalytics)•But:•Inefficientfileformatscommonlyused(text)•Queryenginesareslow!NointeractiveSQL!•ApacheHBasebecomesanApacheTop-LevelProject(TLP)•Introducesfastrandomaccess•Earlyadoptersexperimentwithnewusecases•DeployedatFacebookandotherlargecompanies13©Cloudera,Inc.Allrightsreserved.EvolutionofHadoop(Enterprise/2012-2015)•Reliablecorebringsnewusersandusecases•Enterprisefeatures:accesscontrol,disasterrecovery,encryption•Expandedcapabilities:Sparkmachinelearning,Streaming,etc.•Introductionoffastqueryengines•10-100xfasterSQL-on-Hadoop(Impala,Spark,etc.)•PushesHDFSperformanceimprovements:caching,CPUefficiency,columnarfileformats(ApacheParquet,ORCFile)14ApacheSpark:ABetterMapReduceEasy,ExpressiveAPI•RichAPI(Java,Scala,andPython)•Interactiveshell•2-5xlesscodeneededthanMRFastExecution•Generalexecutiongraphs•In-memorystorage•Order-of-magnitudeimprovementoverMR©Cloudera,Inc.Allrightsreserved.15WhyDidtheHadoopEcosystemSucceed?1.2.3.OpensourcecommunityandlicenseAlargeanddiversecommunityofdevelopershashistoricallymade,andcontinuestomake,theHadoopecosystemamongthemostactiveandengagedinhistory,whiletheApacheLicenselowersthebarriertoentryforusers.Extensibility/adaptabilityWiththepossibleexceptionofLinux,noothercomplexplatformhasevolvedonsomanylevels,andsoquickly,tomeetuserrequirementsovertime.R&DInvestmentfromvendorsandusersOver$1BUSDinvestedinHadoopvendorssuchasCloudera,Hortonworks,etc.CollaborationfromuserteamslikeFacebook,Twitter,Yahoo,Xiaomi,etc.16©Cloudera,Inc.Allrightsreserved.BuildingabusinessonopensourceMakingmoneygivingawaysoftwareforfree?17Top3ReasonsOpenSourceisGoodforCustomersThesebenefitsderivefromuseofthepermissiveApacheLicense.[1]FreeEvaluationInstall,test,inspect,andevaluateopensourcecodeinperpetuity,withnofinancialobligation.[2]FreedomfromLock-inUseopensourcesoftwareinproductionwithoutpayingroyaltiesorforsupport.[3]ScalableInnovationThecollectiveworkofaglobal,passionatecommunitykeepsthecodebaseevolving.18Top3ReasonsOpenSourceisRiskyforCustomers[1]NosupportEmailamailinglist?NoSLAsor24/7support.[2]LimitedQANewreleasesofftrunkmaynotbereadytouse.Complexcompatibilitymatrix,andnointegrationtestingwithcommercialpartners.[3]MovestoofastEcosystemof20+opensourceprojectsmeansanewversioncomesouteveryweek.Mayneedanentireengineeringteamtokeeptrack!19CDH:100%ApacheHadoop,AndMorePertheprovenRedHatEnterpriseLinuxmodelofopensourcedistributionCuratedfix
本文标题:Apache-Hadoop-Ecosystem十年发展的革新之路
链接地址:https://www.777doc.com/doc-1493913 .html