您好,欢迎访问三七文档
天照Hive高级编程Agenda•HiveComponents•MapReduce•HiveQL•Hive优化•SQL优化HIVE:ComponentsFacebookHDFSHiveCLIDDLQueriesBrowsingMapReduceMetaStoreThriftAPISerDeThriftCSVJSON..ExecutionParserPlannerDBWebUIOptimizerMachine2Machine1k1,v1k2,v2k3,v3k4,v4k5,v5k6,v6(Simplified)MapReduceReviewnk1,nv1nk2,nv2nk3,nv3nk2,nv4nk2,nv5nk1,nv6LocalMapnk2,nv4nk2,nv5nk2,nv2nk1,nv1nk3,nv3nk1,nv6GlobalShufflenk1,nv1nk1,nv6nk3,nv3nk2,nv4nk2,nv5nk2,nv2LocalSortnk2,3nk1,2nk3,1LocalReduceHiveQL–Join•SQL:INSERTINTOTABLEpv_usersSELECTpv.pageid,u.ageFROMpage_viewpvJOINuseruON(pv.userid=u.userid);pageiduseridtime11119:08:0121119:08:1312229:08:14useridagegender11125female22232malepageidage125225132X=page_viewuserpv_usersHiveQL–JoininMapReducekeyvalue1111,11111,22221,1pageiduseridtime11119:08:0121119:08:1312229:08:14useridagegender11125female22232malepage_viewuserkeyvalue1112,252222,32Mapkeyvalue1111,11111,21112,25keyvalue2221,12222,32ShuffleSortpageipageiReduceHiveQL–GroupBy•SQL:▪INSERTINTOTABLEpageid_age_sum▪SELECTpageid,age,count(1)▪FROMpv_users–GROUPBYpageid,age;pageidage125225132225pv_userspageidageCount125122521321pageid_age_sumHiveQL–GroupByinMapReducepageidage125225pv_userspageid11pageid_age_sumpageidage132225Mapkeyvalue1,2512,251keyvalue1,3212,251keyvalue1,2511,321keyvalue2,2512,251ShuffleSortpageid2ReduceHiveQL–GroupBywithDistinct•SQL–SELECTpageid,COUNT(DISTINCTuserid)–FROMpage_viewGROUPBYpageidpageiduseridtime11119:08:0121119:08:1312229:08:1421119:08:20page_viewpageidcount_distinct_userid1221resultHiveOptimizationsEfficientexecutionofSQLonMapReduceMachine2Machine1k1,v1k2,v2k3,v3k4,v4k5,v5k6,v6(Simplified)MapReduceRevisitnk1,nv1nk2,nv2nk3,nv3nk2,nv4nk2,nv5nk1,nv6LocalMapnk2,nv4nk2,nv5nk2,nv2nk1,nv1nk3,nv3nk1,nv6GlobalShufflenk1,nv1nk1,nv6nk3,nv3nk2,nv4nk2,nv5nk2,nv2LocalSortnk2,3nk1,2nk3,1LocalReduceHiveOptimizations–MergeSequentialMapReduceJobs•SQL:–FROM(ajoinbona.key=b.key)joincona.key=c.keySELECT…keyavbv1111222keyav1111AMapReducekeybv1222Bkeycv1333CABMapReducekeyavbvcv1111222333ABCHiveOptimizations–ShareCommonReadOperations•ExtendedSQL▪FROMpv_users▪INSERTINTOTABLEpv_pageid_sum▪SELECTpageid,count(1)▪GROUPBYpageid▪INSERTINTOTABLEpv_age_sum▪SELECTage,count(1)▪GROUPBYage;pageidage125232MapReducepageidcount1121pageidage125232MapReduceagecount251321HiveOptimizations–MapJoin•MapJoins–Userspecifiedsmalltablesstoredinhashtablesonthemapperbackedbyjdbm–NoreducerneededINSERTINTOTABLEpv_usersSELECT/*+MAPJOIN(pv)*/pv.pageid,u.ageFROMpage_viewpvJOINuseruON(pv.userid=u.userid);HiveQL–MapJoinkeyvalue1111,22222pageiduseridtime11119:08:0121119:08:1312229:08:14useridagegender11125female22232malepage_viewuserPageidage125225132Hashtablepv_usersGroupbyOptimizations•Mapsidepartialaggregations–Hash-basedaggregates–Serializedkey/valuesinhashtables–90%speedimprovementonQuery•SELECTcount(1)FROMt;Parameters•hive.map.aggr=true•hive.groupby.skewindata=false•hive.groupby.mapaggr.checkinterval=100000•hive.map.aggr.hash.percentmemory=0.5•hive.map.aggr.hash.min.reduction=0.5MultiGroupByFROMpv_usersINSERTOVERWRITETABLEpv_gender_sumSELECTgender,count(DISTINCTuserid),count(userid)GROUPBYgenderINSERTOVERWRITETABLEpv_age_sumSELECTage,count(DISTINCTuserid)GROUPBYageHiveQL–GroupByinMapReducegenderageuseridM251M252M251M241F242F241pv_usersgenderdistcountM24F22genderdistcountM13F11agedist241251Key:useridValue:gender,agegenderdistcountM11F11agedist251241agedist251241Loadbalancingfordataskew•GroupBy数据倾斜–skewindata优化–用法•sethive.groupby.skewindata=trueHiveOptimizations–LoadBalanceProblempageidage125125125232125pv_userspageidagecount12542321pageid_age_sumMap-Reducepageidagecount125223211252pageid_age_partial_sumMap-ReduceSQL优化•数据倾斜•Join顺序•MaponlySkew•数据倾斜–倾斜的原因?•groupby/distinct•joinJoin顺序•内存优化–驱动表•使用大表做驱动表,避免内存溢出•Join中最右边的表是驱动表•MapJoin无视Join顺序,使用大表做驱动表•STREAMTABLEMapOnly•Maponly–特征•没有Join、GroupBy、Orderby、Sortby等,导致无Reduce•每个Map有一个输出文件,输入数据量大,Map数很多导致输出文件很多–缺点•依赖此job输出的下一个job,map数很大•Fetch结果很慢课程回顾、总结页•HiveComponents•MapReduce•HiveQL•Hive优化•SQL优化谢谢!
本文标题:Hive高级编程
链接地址:https://www.777doc.com/doc-3137100 .html