您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据挖掘与识别 > 大数据系统和分析技术综述_程学旗20
ISSN1000-9825,CODENRUXUEWE-mail:jos@iscas.ac.cnJournalofSoftware,2014,25(9):18891908[doi:10.13328/j.cnki.jos.004674]©.Tel/Fax:+86-10-62562563,,,,,(,100190):,E-mail:jinxiaolong@ict.ac.cn:,,;,(),;,.:;;;;;:TP301:,,,,,..,2014,25(9):18891908.:ChengXQ,JinXL,WangYZ,GUOJF,ZhangTY,LiGJ.Surveyonbigdatasystemandanalytictechnology.RuanJianXueBao/JournalofSoftware,2014,25(9):12401252(inChinese).,JINXiao-Long,WANGYuan-Zhuo,GUOJia-Feng,ZHANGTie-Ying,LIGuo-Jie(KeyLaboratoryofNetworkDataScienceandTechnology,InstituteofComputingTechnology,TheChineseAcademyofSciences,Beijing100190,China)Correspondingauthor:JINXiao-Long,E-mail:jinxiaolong@ict.ac.cnAbstract:Thispaperfirstintroducesthekeyfeaturesofbigdataindifferentprocessingmodesandtheirtypicalapplicationscenarios,aswellascorrespondingrepresentativeprocessingsystems.Itthensummarizesthreedevelopmenttrendsofbigdataprocessingsystems.Next,thepapergivesabriefsurveyonsystemsupportedanalytictechnologiesandapplications(includingdeeplearning,knowledgecomputing,socialcomputing,andvisualization),andsummarizesthekeyrolesofindividualtechnologiesinbigdataanalysisandunderstanding.Finally,thepaperlaysoutthreegrandchallengesofbigdataprocessingandanalysis,i.e.,datacomplexity,computationcomplexity,andsystemcomplexity.Potentialwaysfordealingwitheachcomplexityarealsodiscussed.Keywords:digdata;dataanalysis;deeplearning;knowledgecomputing;socialcomputing;visualization,.NatureScience.:“,.,”[1].“”,,.,“”.,.,(physicalworld)(humansociety),:(973)(2014CB340401,2012CB316303);(61232010,61100175,61173008,61202214);(Z121101002512063):2014-05-09;:2014-07-011890JournalofSoftwareVol.25,No.9,September2014(cyberspace),[2,3].,.,().IDC,20205.3,2013~2020,IT90%.,(secondeconomy[4]).Auther2011.()().“”,,100.Auther,2030,.,.,.,5V,(volume)(velocity)(variety)(veracity)(value).,.,(variety)(velocity)(veracity).,,;,.,.,.,Google,Facebook,Linkedin,Microsoft.,,.,,,3,.1.Mayer-Schönberger,,,[5].,.,,[6],.,.4.1.1,,,.,,.1.1.1(1)3.,.TBPB.,,,,.,.,,.,.,,.,.,,:1891,,.,.(2),.,.3[712].,:(a):Facebook.,,,.(b):,,,;,,.(c):GoogleYahoo!,.,IT.,.,,.,,,.,:(a):,,;,,.(b):,,,.,,.,.1.1.2Google2003GoogleGFS[13]2004MapReduce[14]Web,.Google,,2006NutchHadoop[15]:HDFSMapReduce.Hadoop,HDFS,MapReduce.HadoopIT,HDFSMapReduce,Hadoop.MapReduce,3[16,17].,MapReduce.,MapReduce.,MapReduce.,(,),.,,,MapReduce.,MapReduce,,,MapReduce.1.2Google2010Dremel,.,.,,PB.1892JournalofSoftwareVol.25,No.9,September20141.2.1(1),,,,,(IP).,,.,,,.[18].,.,.,,..,,.,,,,.,..,.,(),,(),,.,,SQL,.(2),[6]:(a):,.Web.,,.()(),,.Web,.(b):,,,.,,.(BI),BI,,.,,,.,,..1.2.2,TwitterStorm,FacebookScribe,LinkedinSamza,ClouderaFlume,ApacheNutch.TwitterStormStorm[19].,.StormSpout.SpoutBolt,,Bolt.StormBolt(Topology).BoltSpout.Storm,.Storm,,.RPC,.Storm3:(a)Nimbus,,,:1893;(b)Zookeeper,Storm;(c)Supervisor,Worker,Topology,ZookeeperNimbus.StormZookeeper,Supervisor,.Storm:(a):StormMapReduce,.StormTopologySpoutBolt.,Topology;(b):StormZookeeper.,,Topology.Storm,;(c):Storm,.NimbusZookeeper,;(d):StormZeroMQ,,.Storm.,.LinkedinSamzaLinkedinKafka[20,21],,Kafka.Kafka4,(broker),Key-Value,BrokerTopic,BrokerTopic.2013,LinkedinKafkaYARNSamza.SamzaKafkaMapReduceHDFS.Samza3,(Kafka)(YARN)(SamzaAPI).Samza.Samza,.Samza,.Samza,,.SamzaYARNKafka.SamzaYarn,TaskRunner,StreamTasks.KafkaBroker.Samza:(a):,SamzaYARN.(b):SamzaKafka,.(c):Samza;Kafka;YARNSamza.1.31.3.1(1),.,,,,.,,.,.(2),..(a),.(DBMS),,(OLTP)(OLAP).OLTP,;OLAP(datawarehouse)(BI).,BI,,.,1894JournalofSoftwareVol.25,No.9,September2014,Hive[22]Pig[23].(b).,.,,,,,.,,Yahoo!.,,.,.,NoSQL,HBase[24];MongoDB[25]JSON.NoSQLJoin,.1.3.2BerkeleySparkGoogleDremel.BerkeleySparkSpark[26].MapReduce,I/O,Spark,.SparkHadoopAPI,SparkHadoop10~100[26].SparkHadoopAPI,HDFS,HBASE,SequenceFile.Spark-ShellSpark,.Spark,().Spark3:(a)Spark.SparkScala,Scala,Spark,,.(b)Spark.SparkHDFS,Spark.(c)Spark.Spark(RDD),RDDScala,,,.Spark,HadoopCloudera,Pivotal,MapRHortonworksSpark.GoogleDremelDremel[27]Google,.Dremel,PB.MapReduce,,Dremel.DremelMapReduce,MapReduceDremel,Dremel,MapReduce.Dremel.Dremel5:(a)Dremel.PB,,100MB/S,1s1TB1,,,,,().(b)DremelMapReduce.DremelGFS,MapReduce.(c)Dremel.DremelJson,.,Join,,Dremel.(d)Dremel.,,,CPU.,,CPU.(e):1895DremelWebDBMS.,Web,,,.,DBMS,DremelSQL-like.1.4,,.,,.(),..1.4.1(1),3.,.,,,,,,.,.,,.,,.,.,,,..,,,;,,..(2),,,.(a).,Web2.0(Facebook)(Twitter),.,;,.,E-mail,;,,PageRank.(b).,,DNA.(c).,.,,.,,,.,.1.4.2GraphLab,Giraph(Pregel),Neo4j,HyperGraphDB,InfiniteGraph,Cassovary,TrinityGrappa.3,GooglePregel,Neo4jTrinity.GooglePregelPregel[28,29]GoogleBSP(Bulksynchronousparallel),(BFS)(SSSP)PageRank.BSP,“--”.(superstep).,,,1896JournalofSoftwareVol.25,No.9,September2014,,,.Pregel,:.,“VotetoHalt”.,.,.Pregel3:(a)/(Master/Slave).Master,,IDSlave,Slave,Master;(b).PregelCheckpoint,Master,;(c)GFSBigTable.ApacheGoogle2010PregelGiraph,Facebook.Neo4jNeo4j[30]ACID.Java,,.Neo4j..Neo4j,OLAP,,.Neo4j5.(a):Neo4j,ACID;(b):Neo4j;(c):Neo4j,//,;(d):Neo4j,Java-API.JRuby/R
本文标题:大数据系统和分析技术综述_程学旗20
链接地址:https://www.777doc.com/doc-28531 .html