您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据挖掘与识别 > Serengeti-虚拟化你的大数据应用(VMWare)41
©2009VMwareInc.AllrightsreservedSerengeti-虚拟化你的大数据应用蔺永华Vmware,Inc.Agenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&AToday’sBigDataSystem:ETLUnstructuredData(HDFS)RealTimeStructuredDatabaseBigSQLDataParallelBatchProcessingRealTimeStreamsReal-TimeProcessing(s4,storm)AnalyticsAgenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&AChallengesToUseHadoopinphysicalinfrastructureDeployment•Difficulttodeploy,costseveralpeopleforseveraldaysevenmonths•DifficulttotuneclusterperformanceLowEfficiency•Hadoopclustersaretypicallynot100%utilizedacrossallhardwareresources.•DifficulttoshareresourcessafelybetweendifferentworkloadSinglePointofFailure•SinglepointoffailureforNameNodeandJobtracker•NoHAforHive,HCatalog,etc.WhyVirtualizeHadoop?-GetyourHadoopclusterinminutes1/1000humanefforts,LeastHadoopoperationknowledgeFullyautomatedprocess,10minutestogetaHadoop/HBaseclusterfromscratchServerpreparationOSinstallationAutomatebySerengetionvSpherewithbestpracticeNetworkConfigurationHadoopInstallationandConfigurationManualprocess,costdaysWhyVirtualizeHadoop?-ConsolidatesprawlingclustersClustersshareserverswithstrongisolation•SingleHardwareInfrastructure•UnifiedoperationsOptimize•SharedResources=higherutilization•Elasticresources=fasteron-demandaccessHadoopDevHadoopProdHBaseClusterSprawlingSinglepurposeclustersforvariousbusinessapplicationsleadtoclustersprawl.ClusterConsolidationSimplifyFinanceHadoopVirtualizationPlatformHadoopDevHadoopProdHBase...PortalHadoopPortalHadoop30%CAPEXDown50%+resourcesaresittingidlewhilehighpriorityjobisburningupitscluster.Utilizeallresourcesfrompoolondemand.DynamicelasticscalingonsharedresourcepoolWhyVirtualizeHadoop?–Utilizeallyourresourcestosolvethepriorityproblem3XfastertogetanalyticresultsvSphereHighAvailability(HA)-protectionagainstunplanneddowntimeOverview•ProtectionagainsthostandVMfailures•Automaticfailuredetection(host,guestOS)•Automaticvirtualmachinerestartinminutes,onanyavailablehostincluster•OSandapplication-independent,doesnotrequirecomplexconfigurationchanges(Coordination)ZookeeprManagementServerHighAvailabilityfortheHadoopStack(HadoopDistributedFileSystem)HBase(Key-Valuestore)HDFSMapReduce(JobScheduling/ExecutionSystem)Pig(DataFlow)HiveBIReportingETLToolsRDBMSJobtrackerNamenode(SQL)HiveMetaDBHCatalogHcatalogMDBServerXXHAHAAppOSAppAppOSOSAppOSAppOSAppOSAppOSVMwareESXXVMwareESX•Zerodowntime,zerodatalossfailoverforallvirtualmachinesincaseofhardwarefailures•IntegratedwithVMwareHA/DRS•Nocomplexclusteringorspecializedhardwarerequired•SinglecommonmechanismforallapplicationsandoperatingFTvSphereFaultToleranceprovidescontinuousprotectionOverview•SingleidenticalVMsrunninginlocksteponseparatehostssystemsZerodowntimeforNameNode,JobTrackerandothercomponentsinHadoopclustersAgenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&AEasyandrapiddeploymentandmanagementOpensourceprojectlaunchedinJune2012,0.8isreleasedatApr.andwillrelease0.9atJun.ToolkitthatleveragevirtualizationtosimplifyHadoopdeploymentandoperationsDeployaclusterin10MinutesfullyautomatedCustomizeHadoopandHBaseclusterAutomatedclusteroperationComewitheco-systemcomponentsSupportallpopularHadoopDistributionsSerengetiDemo:10minutestoaHadoopclusterwithSerengetiAgenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&ACommonquestionsaboutvirtualizationLocalDisk•••••Canlocaldiskbeusedinvirtualizationenvironment?FlexibilityandScalabilityHowtoflexiblescheduleresourcesbetweenclustersanddifferentapplicationsasmentionedabove?DatastabilityInvirtualenvironment,howcanwedistributedataacrosshostandrack?DatalocalityHadoopwillschedulecomputetasksnearbythedata,toreducenetworkIOfordataR/W.Canvirtualenvironmentgetthesameresult?PerformanceHowabouttheperformanceinvirtualenvironment?Agenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&ACanIuselocaldiskeasily?OtherVMOtherVMOtherVMOtherVMOtherVMOtherVMOtherVMOtherVMHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopHadoopSerengetiExtendVirtualStorageArchitecturetoIncludeLocalDiskSharedStorage:SANorNAS•Easytoprovision•AutomatedclusterrebalancingHybridStorage•SANforbootimages,otherworkloads•LocaldiskforHadoop&HDFSHostHostHostHostHostHostHowtoflexiblescalein/scaleoutHowtoflexiblescheduleresourcesbetweenclustersanddifferentapplications?-ComputeCurrentHadoop:T1T2VMVMVMVMCombinedStorage/ComputeHadoopinVM-*VMlifecycledeterminedbyDatanode-*LimitedelasticityVMStorageSeparateStorageVMStorageSeparateComputeClusters-*Separatecompute-fromdata-*Removeelasticconstrain-byDatanode-*Elasticcompute-*Raiseutilization-*Separa
本文标题:Serengeti-虚拟化你的大数据应用(VMWare)41
链接地址:https://www.777doc.com/doc-24342 .html