您好,欢迎访问三七文档
当前位置:首页 > 建筑/环境 > 工程监理 > 演讲人蔺永华,VMware高级开发工程师
©2009VMwareInc.AllrightsreservedSerengeti-虚拟化你的大数据应用蔺永华Vmware,Inc.Agenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&AToday’sBigDataSystem:ETLRealTimeStreamsUnstructuredData(HDFS)RealTimeStructuredDatabaseBigSQLDataParallelBatchProcessingReal-TimeProcessing(s4,storm)AnalyticsAgenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&AChallengesToUseHadoopinphysicalinfrastructureDeployment•Difficulttodeploy,costseveralpeopleforseveraldaysevenmonths•DifficulttotuneclusterperformanceLowEfficiency•Hadoopclustersaretypicallynot100%utilizedacrossallhardwareresources.•DifficulttoshareresourcessafelybetweendifferentworkloadSinglePointofFailure•SinglepointoffailureforNameNodeandJobtracker•NoHAforHive,HCatalog,etc.WhyVirtualizeHadoop?-GetyourHadoopclusterinminutesHadoopInstallationandConfigurationNetworkConfigurationOSinstallationServerpreparationManualprocess,costdaysFullyautomatedprocess,10minutestogetaHadoop/HBaseclusterfromscratch1/1000humanefforts,LeastHadoopoperationknowledgeAutomatebySerengetionvSpherewithbestpracticeWhyVirtualizeHadoop?-ConsolidatesprawlingclustersSinglepurposeclustersforvariousbusinessapplicationsleadtoclustersprawl.ClustersshareserverswithstrongisolationSimplify•SingleHardwareInfrastructure•UnifiedoperationsOptimize•SharedResources=higherutilization•Elasticresources=fasteron-demandaccessHadoopDevHadoopProdHBaseClusterSprawlingClusterConsolidationFinanceHadoopVirtualizationPlatformHadoopDevHadoopProdHBase...PortalHadoopPortalHadoop30%CAPEXDownWhyVirtualizeHadoop?–Utilizeallyourresourcestosolvethepriorityproblem50%+resourcesaresittingidlewhilehighpriorityjobisburningupitscluster.Utilizeallresourcesfrompoolondemand.Dynamicelasticscalingonsharedresourcepool3XfastertogetanalyticresultsvSphereHighAvailability(HA)-protectionagainstunplanneddowntime•ProtectionagainsthostandVMfailures•Automaticfailuredetection(host,guestOS)•Automaticvirtualmachinerestartinminutes,onanyavailablehostincluster•OSandapplication-independent,doesnotrequirecomplexconfigurationchangesOverviewHighAvailabilityfortheHadoopStackHDFS(HadoopDistributedFileSystem)HBase(Key-Valuestore)MapReduce(JobScheduling/ExecutionSystem)Pig(DataFlow)Hive(SQL)BIReportingETLToolsManagementServerZookeepr(Coordination)HCatalogRDBMSNamenodeJobtrackerHiveMetaDBHcatalogMDBServervSphereFaultToleranceprovidescontinuousprotectionAppOSAppOSAppOSXXAppOSAppOSAppOSAppOSXVMwareESXVMwareESX•SingleidenticalVMsrunninginlocksteponseparatehosts•Zerodowntime,zerodatalossfailoverforallvirtualmachinesincaseofhardwarefailures•IntegratedwithVMwareHA/DRS•Nocomplexclusteringorspecializedhardwarerequired•SinglecommonmechanismforallapplicationsandoperatingsystemsFTHAHAOverviewZerodowntimeforNameNode,JobTrackerandothercomponentsinHadoopclustersAgenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&AEasyandrapiddeploymentandmanagementOpensourceprojectlaunchedinJune2012,0.8isreleasedatApr.andwillrelease0.9atJun.ToolkitthatleveragevirtualizationtosimplifyHadoopdeploymentandoperationsDeployaclusterin10MinutesfullyautomatedCustomizeHadoopandHBaseclusterAutomatedclusteroperationComewitheco-systemcomponentsSupportallpopularHadoopDistributionsSerengetiDemo:10minutestoaHadoopclusterwithSerengetiAgenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&ACommonquestionsaboutvirtualizationLocalDisk•Canlocaldiskbeusedinvirtualizationenvironment?FlexibilityandScalability•Howtoflexiblescheduleresourcesbetweenclustersanddifferentapplicationsasmentionedabove?Datastability•Invirtualenvironment,howcanwedistributedataacrosshostandrack?Datalocality•Hadoopwillschedulecomputetasksnearbythedata,toreducenetworkIOfordataR/W.Canvirtualenvironmentgetthesameresult?Performance•Howabouttheperformanceinvirtualenvironment?Agenda•Today’sbigdatasystem•Whyvirtualizehadoop?•Serengetiintroduction•Commonquestionsaboutvirtualization•Serengetisolution•DeepinsightintoSerengeti•Summary•Q&ACanIuselocaldiskeasily?SerengetiExtendVirtualStorageArchitecturetoIncludeLocalDiskSharedStorage:SANorNAS•Easytoprovision•AutomatedclusterrebalancingHybridStorage•SANforbootimages,otherworkloads•LocaldiskforHadoop&HDFSHostHadoopOtherVMOtherVMHostHadoopHadoopOtherVMHostHadoopHadoopOtherVMHostHadoopOtherVMOtherVMHostHadoopHadoopOtherVMHostHadoopHadoopOtherVMHowtoflexiblescalein/scaleoutHowtoflexiblescheduleresourcesbetweenclustersanddifferentapplications?StorageEvolutionofHadooponVMs–Data/ComputeseparationComputeCurrentHadoop:CombinedStorage/ComputeStorageT1T2VMVMVMVMVMVMHadoopinVM-*VMlifecycledeterminedbyDatanode-*LimitedelasticitySeparateStorage-*Separatecomputefromdata-*Removeelasticconstrain-byDatanode-*Elasticcompute-*Rais
本文标题:演讲人蔺永华,VMware高级开发工程师
链接地址:https://www.777doc.com/doc-167041 .html