您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据挖掘与识别 > 飞天开放平台-大数据技术年会-4x324
开放平台ApsaraCloudPlatformAboutAliyunChina’slargestcloudserviceprovider100softhousandsofcustomersBillionsofaccesseseverydayAlibaba-operatedIDCPartner-operatedIDCApsaraCloudPlatformACE3rd-partyPlatformServicesMap,Mail,Search3rd-partyApplicationServicesCustomersISVandSIDevelopersProvidingFoundationServicesoftheCloudEco-systemPaybyusageElasticitySafety(like“tapwater”)TheNatureofCloudComputingScale大规模Economy低成本PublicUtility服务运营Internet-scalecomputing2.5EBgeneratedperday,doublingevery40monthsBillionsoftxnsonTaobaoeveryday,mustbeprocessedin6hoursEconomymeansmorethanlowpricesLeadingtobehaviorchanges(like“telephone”)Keyisscheduling(like“powergrid”)TwoDesignPrinciplesLarge-scalegeneralcomputingplatformasthebaseOnesystemsupportingbothofflineandonlineservicesMulti-tenancy,resourcesharing,loadshiftingWeb-basedAPIasthedeliverymechanismOnlineactivation,pay-by-usageLocation-transparencyLinuxClusterIDCResourceManagement(伏羲)Security(钟馗)RPC(夸父)Naming/Coordination(女娲)ClusterDeployment(大禹)ClusterMonitor(神农)DistributedFileSystem(盘古)JobScheduling(伏羲)ACEOSSOTSODPSECS/SLBRDSMap,Mail,Search,etcCloudMartOtherCloudServicesOSPSCloudComputingServicesElasticComputing弹性计算ECS:virtualizedinstancesofserversthatcanbecreatedandtailoredtomeetapplicationrequirementsSLB:softwareloadbalancingtechnologythatcanelasticallyexpandservicecapacityondemandACE:ConvenientandefficientexecutionenvironmentforWebservices,supportingJava,PHP,Node.jsStorageandDatabases海量存储和数据库Large-scaleComputing大规模计算CloudComputingServicesElasticComputing弹性计算StorageandDatabases海量存储和数据库OSS:large-scaleobjectstorageserviceforunstructureddatasuchasphotos,music,orvideoOTS:largescalestorageserviceforstructuredorsemi-structureddatastorageandreal-timequeryRDS:managedinstancesforrelationaldatabaseswithautomaticbackupandfailoverLarge-scaleDataComputing大规模计算AComparisonofStorageandDatabaseServicesOSSOTSRDSDataModelUnstructuredSemi-structuredFully-structuredTargetDataVolume~10PB~100TB/table~TB/dbTxnSupportNoneLimitedsupportFullSupportProgrammingInterfaceRESTfulAPIRESTfulAPISQLCloudComputingServicesElasticComputing弹性计算StorageandDatabase海量存储和数据库Large-scaleComputing大规模计算ODPS:large-scaledatabatchprocessingandcomputation,supportingSQLandMapReducestyleprogramminglanguagesOSPS:streamdataprocessingservice,supportingSQL-likequerylanguageandautomaticfailurerecoveryApsaraTechnicalHighlightsAcommonplatformsupportingbothofflineandonlineservicesSearch:24Bpagesprocessed,13BonlineindexMail:100Mmailsreceived,10Mmailssent,10mslatencyCapability-basedsecuritymanagementframework,enforcingthePrincipleofLeastPrivilegeDistributeddeployment,monitoringanddiagnosticsZeroSPOF(single-point-of-failure):availability99.9%Alldatahas3replicas:datareliability99.99999999%5K2013/08/15:First-ever5000-nodeApsaracluster(ODPS)wentintoproduction100KCPUcores,100PBrawstorageProcessingpetabytesperday2013/09/24:OpenedaccesstoODPSfor4universities&researchinstitutionsSorting100TBin30minutesCurrentknownrecord:72minutes(Yahoo!,2013/07/03)Pangu:Large-scaleDistributedFileSystemMaster-SlaveArchitectureMasterformetadatamgmt,Slave(ChunkServer)forIOmgmtPaxos-basedmulti-masterarchitecture,failurerecoverytime1minuteEnd-to-endinlinechecksumScalesto1billionfilesPaxosSeparatedIOPipelineandStorageMgmtAdaptiveIOPipelineReplicationmaster:chunkservervsclientReplicationpolicy:chainingvsstar-replicationChunkingpolicy:fixed,variable,orRAIDDurabilityguarantee:txnloggingvssequentialwriteCommonStorageManagementPhysicalIOmanagementPriorityandQoSBackgroundre-replicationChunkplacementStagedEvent-drivenPhysicalIOMgmtChunkServerwouldrearrangeIOrequeststosupportpriority,QoS,andreduceIOseekoverheadDistributedRe-replicationTypical:Mirroring(10hours)Pangu:Distributedre-replication(20min,50-nodes)RAIDBuiltintothecoresysteminsteadofanadd-onlayer(asinHDFSRAID)Bettermanagementofdataintegrity,recovery,andchunkplacementSynchronousredundancyblockgenerationLow-latencyfailurerecoverySmallfilesupport...............JobcontrolResourcerequestsNodecontrolJobsubmissionFuxiResourceSchedulingMulti-dimensionresourcesElasticquotaCGroup-basedisolationFuxiMasterHAAppMasterfailoverIncrementalschedulingFuxiJobProgrammingModelJob:ADAGVertex:TaskEachtaskmayhavemultipleinstancesbasedoninputdatachunksEdge:dataflow,eachtaskmayhavemultipleinput/outputflowsAdataflowconnectingtwotasksrepresentsdatashufflinginput1input2output2output1outputinputMapReduceisadegeneratedcaseExample:FindBest-SellersSELECTprod_id,Sum(count)ASquantityFROMordersGROUPBYprod_idORDERBYquantityDESC;order_idprod_idunit_pricecount0001042151003000203343101000003012345010000404215805…………prod_idquantity0251831790075632845109641229430421520043……ordersFuxiMapReduceComparisonofJob-executionPlanInputprod_idcountprod_idquantityprod_idquantityInputprod_idcountprod_idquantityprod_idquantityprod_idquantity020406080100120020406080100120FuxiMapReduceComparisonofJobExecutionR1MR1MM2R2R2ABriefHistory02/04/2009FirstlineofApsaracode08/27/2010Apsarabecamethecommonplatformforsear
本文标题:飞天开放平台-大数据技术年会-4x324
链接地址:https://www.777doc.com/doc-30555 .html