您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据挖掘与识别 > 大数据分析存储解决方案41
©CopyrightIBMCorporation2014IBM存储解决方案——数据分析的存储IBMSTG谢文华wenhuax@cn.ibm.comPage2从企业数据向大数据的扩展TraditionalApproachStructured,analytical,logicalSystemsofRecordNewApproachCreative,holisticthought,intuitionSystemsOfEngagementMultimediaSystemsofInsightEnterpriseIntegrationandContextAccumulationStructuredRepeatableLinearUnstructuredExploratoryDynamicDataWarehouseWebLogsSocialDataTextData:emailsSensordata:imagesRFIDInternalAppDataTransactionDataMainframeDataOLTPSystemDataHadoopandStreamsTraditionalSourcesNewSourcesERPdata具备洞悉能力的系统SystemsofInsightPage3对新式基础架构的需求在可靠和安全的环境中处理关键业务应用存取和处理海量数据——包括结构化和非结构化数据速度及时响应随时可能出现的商业机会,这就需要灵活、实时性的基础架构ThedynamicsofSoRandSoE:–通过负载及资源部署的优化,来增强灵活性和效益–通过采用包括基于开放标准的技术等新技术来改善ITeconomicsSystemofRecord(SoR)SystemsofEngagement(SoE)对的决策对的地方对的时间点BigData&AnalyticsPage4大数据分析的新型架构解决方案Customerself-serveportalsFraud/theftprotectionCallCentersOutageMgmtBillingsystemsMetersGridCustomersLocationERPIBMWatsonFoundationsInformationIntegration&GovernanceSystemsSystemsSecuritySecurityOnpremise,Cloud,AsaserviceStorageStorageWhatactionshouldItake?DecisionmanagementLanding,ExplorationandArchivedatazoneEDWanddatamartzoneOperationaldatazoneReal-timeDataProcessing&AnalyticsWhatishappening?DiscoveryandexplorationWhydidithappen?ReportingandanalysisWhatcouldhappen?PredictiveanalyticsandmodelingDeepAnalyticsdatazoneWhatdidIlearn,what’sbest?CognitiveNew/EnhancedApplicationsAllDataIBMBigData&AnalyticsInfrastructureDataZoneApplicationZonePage55SmartMeteringGridOperations电网管理FieldService外勤现场服务ResourcePlanning资源规划CustomerService/CustomerOperations实现真正的有效的法规遵从及时发现能源损耗问题、以及偷电和欺诈行为提高客户满意度电量使用预测更为精确电网运维优化减少停电次数和时间案例:SmartMetering智慧电力计费大数据分析应用可以带来真正的业务价值法规遵从Page6案例:用大数据分析来加强SmartMeteringCustomerself-serveportalsFraud/theftprotectionCallCentersOutageMgmtBillingsystemsMetersGridCustomersLocationERPIBMWatsonFoundationsInformationIntegration&GovernanceSystemsSystemsSecuritySecurityOnpremise,Cloud,AsaserviceStorageStorageWhatactionshouldItake?DecisionmanagementLanding,ExplorationandArchivedatazoneEDWanddatamartzoneOperationaldatazoneReal-timeDataProcessing&AnalyticsWhatishappening?DiscoveryandexplorationWhydidithappen?ReportingandanalysisWhatcouldhappen?PredictiveanalyticsandmodelingDeepAnalyticsdatazoneWhatdidIlearn,what’sbest?CognitiveNew/EnhancedApplicationsAllData数据分析的高可用性,以确保随时了解用户喜好跨应用的TB级的数据需求–通用虚拟化存储平台实时收集、存储并分析数据,最快可达50,000datapoints/sec历史用电状态数据的复杂查询处理数据在加载到数据仓库前的清洗、验证,这些数据可能来自很多的用户、收费系统或断电保护系统关系掌控构建和维护电网的唯一试图对整个企业的结构化和非结构化数据t做全局导览Navigation,从中发现Discover价值分析用户用电情况,侦测偷电、改表等行为预测哪些用户适合于哪些分时时段电价或需求/响应服务分时时段电价的实时定价或提供及时的需求/响应服务Page7IBMBigData&AnalyticsReferenceArchitectureBigDataPlatformCapabilitiesInformationIngestReal-timeAnalyticsWarehouse&DataMartsAnalyticAppliancesAllDataSourcesAdvancedAnalytics/NewInsightsNew/EnhancedApplicationsCognitive认知LearnDynamically?Prescriptive规范BestOutcomes?Predictive预测WhatCouldHappen?Descriptive描述WhatHasHappened?ExplorationandDiscoveryWhatDoYouHave?StreamingDataTextDataApplicationsDataTimeSeriesGeoSpatialRelationalSocialNetworkVideo&ImageAutomatedProcessCaseManagementAnalyticApplicationsWatsonCloudServicesISVSolutionsAlertsPage8NewInfrastructureLeveragesDataTypesDatainMotionDataatRestDatainManyFormsInformationIngestionandOperationalInformationDecisionManagementBIandPredictiveAnalyticsNavigationandDiscoveryIntelligenceAnalysisRawDataStructuredDataTextAnalyticsDataMiningEntityAnalyticsMachineLearningLandingArea,AnalyticsZoneandArchiveVideo/AudioNetwork/SensorEntityAnalyticsPredictiveReal-timeAnalyticsExploration,IntegratedWarehouse,andMartZonesDiscoveryDeepReflectionOperationalPredictiveStreamProcessingDataIntegrationMasterDataStreamsInformationGovernance,SecurityandBusinessContinuityBigInsightsStreamsWarehouse©CopyrightIBMCorporation2014Page10InfoSphereBigInsightsHadoop-based低延迟分析,针对多样化的、海量静态数据Data-At-RestNetezzaHighCapacityAppliance基于结构化数据的可查询归档Netezza1000基于结构化数据的BI+定制化分析DataSmartAnalyticsSystem基于结构化数据的运营分析InformixTimeseriesTime-structuredanalyticsInfoSphereWarehouse基于结构化数据的大容量数据分析InfoSphereStreams低延迟流数据分析Velocity,Variety&VolumeData-In-MotionMPPDataWarehouseStreamComputingInformationIntegrationHadoopInfoSphereInformationServer海量数据集成和转化ApacheHadoop:跨服务器集群的大数据集分布式处理开放系统框架,采用的是一种简单化编程模型IBMBigDataPlatform大数据平台Page11What:一种开源软件,将数据计算分布到整个集群的常见商用服务器和存储上Why:传统的计算架构是一种沿纵向扩展模式,通过更快的SAN、大容量内存和多级缓存将数据加载到CPU上,成本比较高。What:Hadoop把大数据集合拆分区划为小数据集合,再把小数据集合分发到多台普通服务器上,是一种横向扩展模式。Why:Scalable,Flexible,CostEffective,FaultTolerentComponents:MapReduce,HDFSWhatisHadoop?Page12NameNode(Metadatastore)NodesHDFSClusterOperatingSystemNodesElasticStorage-SNCClusterKernelLevelIBMValueforHadoop!HDFS把数据分散存储在多个存储节点Node上HDFS设计时就假设存储节点有失效的可能,所以HDFS会把一份数据复制3份以上,分散存储在多个节点上,从而实现系统整体上的可靠性HDFS文件系统是由服务器节点集群组成的,每台服务器依照HDFS的特有block协议支持网络化block数据HDFSNameNode有发生单点故障的危险IBM在改善文件系统的性能同时消除了单点故障——ElasticStorage-SNC(availableasbetacode)Hadoop说明,MapReduce,HDFSPage13HadoopStackWhatdoesitlooklike?Page14典型Hadoop存储的PainPoints•在选择HDFS的组件(如软件、服务器、网络和存储等)时很难选对•在从测试环境迁移到生产环境时,需要做的调优和调整工作太繁复了•长期持续不断的运维保障过于繁重,比如老要更换失效组件(尤其是硬盘),这使得保证期望的SLA非常难•CPU和存储去耦o本来
本文标题:大数据分析存储解决方案41
链接地址:https://www.777doc.com/doc-26109 .html