您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据挖掘与识别 > [华为]ApacheCarbonData,实现大数据即席查询秒级响应-陈亮
实现大数据即席查询秒级响应LiangChen/陈亮华为大数据开源开发部LeaderApacheCarbonDataPMC&Committer10多年大数据和BI项目开发和实践经验,对大数据开源技术(Hadoop,Spark,CarbonData等)有深入理解.Email:chenliang613@apache.org市场分析实时营销与推荐客户精细分群与个性化推荐预测与影响力分析客户关怀和CEM360°C客户洞察客户忠诚度维系客户关怀与流程优化网络增效网络性能管理与SQM策略保障快速决策与根因分析定位网络问题与规划数据货币化数据变现OTT开放竞合M2M和位置分析Router+WDM(Backbone)DDDOperationsBigDataSuitsCloudOS/OpenStack(LocalResource)+MiddlewareappsforBizappsforOMAPIappsforconsumerOpenStackOSSsuitsBigDataSuitsBSSsuitsE2EICTResourceOrchestrationEngineRaaSNaaSE2EICTResourceOrchestrationEnginePCRFCaaSRRUSmallCellRRUPartnersOMTeamBizCustomerConsumerCPEMxUMxUONTSDNSmarterSoftCom业务和运营的智能融合SDN实时大象流挖掘1IPRAN流量仿真SON网络自动实时优化快速故障关联处理243657小区拥塞动态控制潜在离网用户维挽一站式服务优化开放变现846785Apps&ServicesCloudOS/OpenStack(LocalResource,IaaS)SGSNMMEIMSHSSSDNcontrollerITapps(SaaS)TelcoappsSMS/IPTV…Middleware(PaaS)CloudOS/OpenStack(LocalResource,IaaS)RNCSRCBRASS/PGWGGSNSDNControllerITappsPaaSFWDPIvCPESBCNAT1CloudEdgeCloudOS/OpenStackGSMLTEUMTS3CloudBBCloudOS/OpenStackADSLG.FastVDSL2CloudDSL/OLTSmallCellDDD以太+OTN(Metro)SDN⼤数据现在和未来将深刻的改变运营商Howtochoosestorageforcomplexbigdatarequirements?NoSQLDatabase•Key-Valuestore:lowlatency,5ms•Cannotsupportmulti-dimensionqueryMulti-dimensionalproblem•Pre-computeallaggregationcombinations•Complexity:O(2^n)•Dimension10•Toomuchspace•SlowloadingspeedSharednothingdatabase•Parallel scan + distributed compute•Questionable scalability and fault-tolerance•Cluster size 100 data node•Not suitable for big batch job•Can not integrate with HadoopecosystemSearchengine•All column indexed•Fast searching•Simple aggregation•Designed for search but not OLAP•complex computation: TopN, join, multi-level aggregation•No SQL supportSQLonHadoop•Modern distributed architecture, scale well in computation.•Pipeline based: Impala, Drill, Flink, …•BSPbased: Hive, SparkSQL•BUT, still using file format designed for batch job•Focus on scan only•No index support, not suitable for point or small scan queriesArchitect’schoiceLoadingApp1App2App3ReplicationApp1App2App3Choice 1: CompromisingChoice 2: Replicating of data目录:uCarbonData项目背景和适合的场景u关键技术介绍u性能和DEMO演示uApacheCarbonData社区和路标客户需求:多维组合即席分析详单过滤查询按列扫描查询开源生态集成当前大数据生态系统,没有一种存储方式同时满足上面所有的需求!§按列扫描查询(FullScan):§没有过滤条件,仅仅做汇总计算等§只查询几列信息§典型的场景如:§数据清洗处理§日志分析典型场景1:按列扫描查询C1C2C3C4C5C6C7R1R2R3R4R5R6R7R8R9R10…..§详单过滤查询(SmallScan):§按关键字快速过滤查询(类似HBase)§多组过滤条件组合,查询所有列§要求查询性能秒级响应§典型的场景如:§运维查询§用户行为分析典型场景2:详单过滤查询C1C2C3C4C5C6C7R1R2R3R4R5R6R7R8R9R10……§即席分析/Adhoc查询:§汇总计算§多维度组合OLAP分析§低时延即席查询§典型的场景如:§Dash-Board报表§Ad-hoc分析典型场景3:多维组合即席分析C1C2C3C4C5C6C7R1R2R3R4R5R6R7R8R9R10R11详单过滤查询(Smallscan)按列扫描查询(Fullscan)多维组合即席分析(OLAPanalysis)CarbonData(一份数据满足所有cases)ApacheCarbonData实现一份数据同时满足多种业务需求,与Spark引擎对接后形成一套分布式多维分析解决方案。为什么开始CarbonData项目?目录:uCarbonData项目背景和适合的场景u关键技术介绍u性能和DEMO演示uApacheCarbonData社区和路标v分布式能力v快速查询秒级响应v高效数据存储方式v无缝与大数据生态集成开源是为了构建生态,CarbonData是数据存储层技术,要发挥价值,需要与计算层、查询层有效集成在一起,形成E2E生态发挥最大价值。CarbonData设计思路CarbonData独特的价值特性v多种索引(MDK,MinMax,倒排),快速找到目标数据v字典编码,减少计算开销v支持数据更新IUD(开发中ing)v与大数据生态无缝集成,具有HDFS分布式、可靠性等所有优点YearsQuartersMonthsTerritoryCountryQuantitySales2003QTR1JanEMEAGermany14211,4322003QTR1JanAPACChina54154,7022003QTR1JanEMEASpain44344,6222003QTR1FebEMEADenmark54558,8712003QTR1FebEMEAItaly67556,1812003QTR1MarAPACIndia529,7492003QTR1MarEMEAUK57051,0182003QTR1MarJapanJapan56155,2452003QTR2AprAPACAustralia52550,3982003QTR2AprEMEAGermany14411,532[1,1,1,1,1]:[142,11432][1,1,1,3,2]:[541,54702][1,1,1,1,3]:[443,44622][1,1,2,1,4]:[545,58871][1,1,2,1,5]:[675,56181][1,1,3,3,6]:[52,9749][1,1,3,1,7]:[570,51018][1,1,3,2,8]:[561,55245][1,2,4,3,9]:[525,50398][1,2,4,1,1]:[144,11532]多维Key索引介绍数据即索引(multi-dimensionalkeys)Blocklet Logical ViewSort(MDK Index)[1,1,1,1,1]:[142,11432][1,1,1,1,3]:[443,44622][1,1,1,3,2]:[541,54702][1,1,2,1,4]:[545,58871][1,1,2,1,5]:[675,56181][1,1,3,1,7]:[570,51018][1,1,3,2,8]:[561,55245][1,1,3,3,6]:[52,9749][1,2,4,1,1]:[144,11532][1,2,4,3,9]:[525,50398]Sorted MDK Index1111111111111111112211122333441131112313142443541545675570561521445251143244622547025887156181510185524597491153250398C1 C2 C3 C4 C5 C6 C71324578619Encoding•列式索引和排序•高效数据压缩(1/3)Blocklet Physical View110142443541545675570561521445251143244622547025887156181510185524597491153250398C1d r d r d r d r d r d r d r1101822110132233421101621331243917131…1221314151…1191312141…C2C3C4C5C6C7[1|1] :[1|1] :[1|1] :[1|1] :[1|1] : [142]:[11432][1|2] :[1|2] :[1|2] :[1|2] :[1|9] : [443]:[44622][1|3] :[1|3] :[1|3] :[1|4] :[2|3] : [541]:[54702][1|4] :[1|4] :[2|4] :[1|5] :[3|2] : [545]:[58871][1|5] :[1|5] :[2|5] :[1|6] :[4|4] : [675]:[56181][1|6] :[1|6] :[3|6] :[1|9] :[5|5] : [570]:[51018][1|7] :[1|7] :[3|7] :[2|7] :[6|8] : [561]:[55245][1|8] :[1|8] :[3|8] :[3|3] :[7|6] : [52]:[9749][1|9] :[2|9] :[4|9] :[3|8] :[8|7] : [144]:[11532][1|10]:[2|10]:[4|10]:[3|10] :[9|10] : [525]:[50398]sort column within column chunk)Run Length Encoding & CompressionDim1Block1(1-10)Dim2Block1(1-8)2(9-10)Dim3Block1(1-3)2(4-5)3(6-8)4(9-
本文标题:[华为]ApacheCarbonData,实现大数据即席查询秒级响应-陈亮
链接地址:https://www.777doc.com/doc-24400 .html