您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 信息化管理 > Big Data Analytics Beyond Hadoop
BigDataAnalyticsBeyondHadoopThispageintentionallyleftblankBigDataAnalyticsBeyondHadoopReal-TimeApplicationswithStorm,Spark,andMoreHadoopAlternativesVijaySrinivasAgneeswaran,Ph.D.AssociatePublisher:AmyNeidlingerExecutiveEditor:JeanneGlasserLevineOperationsSpecialist:JodiKemperCoverDesigner:ChutiPrasertsithManagingEditor:KristyHartSeniorProjectEditor:LoriLyonsCopyEditor:CheriClarkProofreader:AnneGoebelSeniorIndexer:CherylLenserCompositor:NonieRatcliffManufacturingBuyer:DanUhrig©2014byVijaySrinivasAgneeswaranPearsonEducation,Inc.UpperSaddleRiver,NewJersey07458Forinformationaboutbuyingthistitleinbulkquantities,orforspecialsalesopportuni-ties(whichmayincludeelectronicversions;customcoverdesigns;andcontentparticulartoyourbusiness,traininggoals,marketingfocus,orbrandinginterests),pleasecontactourcorporatesalesdepartmentatcorpsales@pearsoned.comor(800)382-3419.Forgovernmentsalesinquiries,pleasecontactgovernmentsales@pearsoned.com.ForquestionsaboutsalesoutsidetheU.S.,pleasecontactinternational@pearsoned.com.Companyandproductnamesmentionedhereinarethetrademarksorregisteredtrade-marksoftheirrespectiveowners.ApacheHadoopisatrademarkoftheApacheSoftwareFoundation.Allrightsreserved.Nopartofthisbookmaybereproduced,inanyformorbyanymeans,withoutpermissioninwritingfromthepublisher.PrintedintheUnitedStatesofAmericaFirstPrintingApril2014ISBN-10:0-13-383794-7ISBN-13:978-0-13-383794-0PearsonEducationLTD.PearsonEducationAustraliaPTY,Limited.PearsonEducationSingapore,Pte.Ltd.PearsonEducationAsia,Ltd.PearsonEducationCanada,Ltd.PearsonEducacióndeMexico,S.A.deC.V.PearsonEducation—JapanPearsonEducationMalaysia,Pte.Ltd.LibraryofCongressControlNumber:2014933363ThisbookisdedicatedatthefeetofLordNataraja.ThispageintentionallyleftblankContentsForeword.....................................ixAbouttheAuthor..............................xviiChapter1Introduction:WhyLookBeyondHadoopMap-Reduce?...................................1HadoopSuitability................................3BigDataAnalytics:EvolutionofMachineLearningRealizations..............................9ClosingRemarks.................................17References......................................17Chapter2WhatIstheBerkeleyDataAnalyticsStack(BDAS)?.................................21MotivationforBDAS.............................21BDASDesignandArchitecture.....................26Spark:ParadigmforEfficientDataProcessingonaCluster.....................................31Shark:SQLInterfaceoveraDistributedSystem.......42Mesos:ClusterSchedulingandManagementSystem...46ClosingRemarks.................................52References......................................54Chapter3RealizingMachineLearningAlgorithmswithSpark.....................................61BasicsofMachineLearning........................61LogisticRegression:AnOverview...................67LogisticRegressionAlgorithminSpark...............70SupportVectorMachine(SVM)....................74PMMLSupportinSpark..........................79MachineLearningonSparkwithMLbase............90References......................................91viiiBIGDATAANALYTICSBEYONDHADOOPChapter4RealizingMachineLearningAlgorithmsinRealTime...................................93IntroductiontoStorm.............................93DesignPatternsinStorm.........................102ImplementingLogisticRegressionAlgorithminStorm.......................................107ImplementingSupportVectorMachineAlgorithminStorm.......................................110NaiveBayesPMMLSupportinStorm..............113Real-TimeAnalyticApplications...................116SparkStreaming................................124References.....................................126Chapter5GraphProcessingParadigms.....................129Pregel:Graph-ProcessingFrameworkBasedonBSP........................................130OpenSourcePregelImplementations...............134GraphLab.....................................138References.....................................156Chapter6Conclusions:BigDataAnalyticsBeyondHadoopMap-Reduce...........................161OverviewofHadoopYARN.......................162OtherFrameworksoverYARN....................165WhatDoestheFutureHoldforBigDataAnalytics?...166References.....................................169AppendixACodeSketches................................171CodeforNaiveBayesPMMLScoringinSpark.......171CodeforLinearRegressionPMMLSupportinSpark.......................................182PageRankinGraphLab..........................186SGDinGraphLab..............................191Index.......................................209ForewordOnepointthatIattempttoimpressuponpeoplelearningaboutBigDataisthatwhileApacheHadoopisquiteuseful,andmostcertainlyquitesuccessfulasatechnology,theunderlyingpremisehasbecomedated.Considerthetimeline:MapReduceimplementationbyGooglecamefromworkthatdatesbackto2002,publishedin2004.Yahoo!begantosponsortheHadoopprojectin2006.MRisbasedontheeconomicsofdatacentersfromadecadeago.Sincethattime,somuchhaschanged:multi-coreprocessors,largememoryspaces,10Gnetworks,SSDs,ands
本文标题:Big Data Analytics Beyond Hadoop
链接地址:https://www.777doc.com/doc-5907159 .html