您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据挖掘与识别 > Apache-Kylin在大数据系统中应用
ApacheKylinOLAPonHadoopWhat’sApacheKylin?TechHighlightsPerformanceRoadmapQ&AExtremeOLAPEngineforBigDataKylinisanopensourceDistributedAnalyticsEnginefromeBaythatprovidesSQLinterfaceandmulti-dimensionalanalysis(OLAP)onHadoopsupportingextremelylargedatasetsWhat’sKylinkylin/ˈkiːˈlɪn/麒麟--n.(inChineseart)amythicalanimalofcompositeform•OpenSourcedonOct1st,2014•BeacceptedasApacheIncubatorProjectonNov25th,2014BigDataEraMoreandmoredatabecomingavailableonHadoopLimitationsinexistingBusinessIntelligence(BI)ToolsLimitedsupportforHadoopDatasizegrowingexponentiallyHighlatencyofinteractivequeriesScale-UparchitectureChallengestoadoptHadoopasinteractiveanalysissystemMajorityofanalystgroupsareSQLsavvyNomatureSQLinterfaceonHadoopOLAPcapabilityonHadoopecosystemnotreadyyet5WhynotBuildanenginefromscratch?ExtremeScaleOLAPEngineKylinisdesignedtoquery10+billionsofrowsonHadoopANSISQLInterfaceonHadoopKylinoffersANSISQLonHadoopandsupportsmostANSISQLqueryfunctionsSeamlessIntegrationwithBIToolsKylincurrentlyoffersintegrationcapabilitywithBIToolslikeTableau.InteractiveQueryCapabilityUserscaninteractwithHivetablesatsub-secondlatencyMOLAPCubeDefineadatamodelfromHivetablesandpre-buildinKylinScaleOutArchitectureQueryserverclustersupportsthousandsconcurrentusersandprovidehighavailabilityFeaturesHighlightsCompressionandEncodingSupportIncrementalRefreshofCubesApproximateQueryCapabilityfordistinctcount(HyperLogLog)LeverageHBaseCoprocessorforquerylatencyJobManagementandMonitoringEasyWebinterfacetomanage,build,monitorandquerycubesSecuritycapabilitytosetACLatCube/ProjectLevelSupportLDAPIntegrationFeaturesHighlights…CubeDesignerJobManagementQueryandVisualizationTableauIntegrationCaseCubeSizeRawRecordsUserSessionAnalysis26TB28+billionrowsClassifiedTrafficAnalysis21TB20+billionrowsGeoXBehaviorAnalysis560GB1.2+billionrowseBay90%query5secondsBaiduBaiduMapinternalanalysisManyotherProofofConceptsBloombergLaw,BritishGAS,JD,Microsoft,StubHub,Tableau…WhoareusingKylinWhat’sApacheKylin?TechHighlightsPerformanceRoadmapQ&AOLAPCubeKylinArchitectureOverview15SQL-BasedTool(BITools:Tableau…)JDBC/ODBCSQLOnlineAnalysisDataFlowOfflineDataFlowClients/UsersinteractivewithKylinviaSQLOLAPCubeistransparenttousersMidLatency-MinutesHadoopHiveStarSchemaDataLowLatency-SecondsDataCube(HBase)KeyValueData3rdPartyApp(WebApp,Mobile…)RESTAPISQLRESTServerQueryEngineRoutingMetadataCubeBuildEngine(MapReduce…)Cube:…FactTable:…Dimensions:…Measures:…Storage(HBase):…DimDimDimFactSourceStarSchemaColumnFamilyRowKeyrowArowBrowCColumnVal1Val2Val3TargetHBaseStorageMappingCubeMetadataDataModelingEndUserCubeModelerAdmintime,itemtime,item,locationtime,item,location,suppliertimeitemlocationsuppliertime,locationTime,supplieritem,locationitem,supplierlocation,suppliertime,item,suppliertime,location,supplieritem,location,supplier1-Dcuboids2-Dcuboids3-Dcuboids4-D(base)cuboid•Basevs.aggregatecells;ancestorvs.descendantcells;parentvs.childcells1.2.3.4.5.(9/15,milk,Urbana,Dairy_land)-time,item,location,supplier(9/15,milk,Urbana,*)-time,item,location(*,milk,Urbana,*)-item,location(*,milk,Chicago,*)-item,location(*,milk,*,*)-item••OLAPCube–BalancebetweenSpaceandTimeCuboid=onecombinationofdimensionsCube=allcombinationofdimensions(allcuboids)0-D(apex)cuboidCubeBuildJobFlowHowToStoreCube?–HBaseSchemaDynamicdatamanagementframework.FormerlyknownasOptiq,CalciteisanApacheincubatorproject,usedbyApacheDrillandApacheHive,amongothers.–Calcite•••••MetadataSPI–ProvidetableschemafromKylinmetadataOptimizeRule–TranslatethelogicoperatorintoKylinoperatorRelationalOperator–Findrightcube–TranslateSQLintostorageengineAPIcall–Generatephysicalexecuteplanbylinq4jjavaimplementationResultEnumerator–Translatestorageengineresultintojavaimplementationresult.SQLFunction–AddHyperLogLogfordistinctcount–Implementdatetimerelatedfunctions(i.e.Quarter)HowtoQueryCube?KylinExtensionsonCalciteQueryEngine–KylinExplainPlanSELECTtest_cal_dt.week_beg_dt,test_category.category_name,test_category.lvl2_name,test_category.lvl3_name,test_kylin_fact.lstg_format_name,test_sites.site_name,SUM(test_kylin_fact.price)ASGMV,COUNT(*)ASTRANS_CNTFROMtest_kylin_factLEFTJOINtest_cal_dtONtest_kylin_fact.cal_dt=test_cal_dt.cal_dtLEFTJOINtest_categoryONtest_kylin_fact.leaf_categ_id=test_category.leaf_categ_idANDtest_kylin_fact.lstg_site_id=test_category.site_idLEFTJOINtest_sitesONtest_kylin_fact.lstg_site_id=test_sites.site_idWHEREtest_kylin_fact.seller_id=123456ORtest_kylin_fact.lstg_format_name=’New'GROUPBYtest_cal_dt.week_beg_dt,test_category.category_name,test_category.lvl2_name,test_category.lvl3_name,test_kylin_fact.lstg_format_name,test_sites.site_nameOLAPToEnumerableConverterOLAPProjectRel(WEEK_BEG_DT=[$0],category_name=[$1],CATEG_LVL2_NAME=[$2],CATEG_LVL3_NAME=[$3],LSTG_FORMAT_NAME=[$4],SITE_NAME=[$5],GMV=[CASE(=($7,0),null,$6)],TRANS_CNT=[$8])OLAPAggregateRel(group=[{0,1,2,3,4,5}],agg#0=[$SUM0($6)],agg#1=[
本文标题:Apache-Kylin在大数据系统中应用
链接地址:https://www.777doc.com/doc-1829244 .html