您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 经营企划 > 11国内外网络搜索引擎的发展现状
2..................................................................555.1............................................................................555.2......................................................................56................................................................................59......................................................................................63................................................................................64Abstract........................................................................................671[1]1Yahoo2robotbasedsearchengines,ALTAVISTA3[2]51994Yahoo,InfoSeek,AltaVista10010SIRInformationRetrieval~262000,Web6WebMiningWebWeb1ranking21.4.171.4.2[3]12Agent3AgentAgent81IPCGIIP,,IPIP234URLVSMWebinformationgain9VSM5AgentAgent6IADWADDCAICLANaïveBayes7IADWA10[4][5]4INSPECCOMPENDEX5146Web[6][27]URLURL[28][7]2.3.1S1={q1,q2,,qn}nmS2={Q1,Q2,,Qm}S3={C1,C2,,Cm}15S2S2x%??????100/11/mximjjiCC?20%80%2.3.2S11,000iAi={ai1,ail000}TiAiTiAi4816?????????10001,0,1,jiijiijijijiTaTaCC?2.3.3nn=2000,10{p1,,pn}C1,Cn,ii????njjiiCC1/?171234547%12.1%7.4%5.0%3.7%5URL1,,447%575%100182.3.4URL2.3.1URLURLURLURL5aURL[2]100`URL161/61/32/3URL2.3.5URLURLURLURL192.3.4QjURLUiWijURLURLWiQjURLPj=(Wj1,Wj2,Wjn)URLURLP=W1W2WnPPPPPPjjj???),cos(1URL10,0005b0.8URLURL2.3.6Agent20AgentVSM[8]AgentURL)5.0log(???iiinNTPPiiTiNni:)5.0(log)()5.0log(221???????ikiiiiinNTnNTPAgent21[9]PDk30302010.1,0.05,0.010.9800.99222221222212211),cos(),(kkkkiiidddpppdpdpdpdpdpdpDPSIM??????????????????????????222212222122112)(),(kkkkiiidddpppdpdpdpdpdpDPSIM?????????????????????????Agent220.040.0397AgentAgentAgentmURLBookmarkAgentjWjpAgentRjjkjkkjjwpwppwR???),cos(3Agentn[10]mURLBookmarkSVMAgentAgentAgentURLAgent23Agent1Agent2Agent1aAgent2Agent2AgentURLAgentIRWebWebStanfordGoogle[4]IBMClever[11]GooglePageRank12323PageRank1??????NjiiiiijjnWlddW,1)1(Agent24qpqthatsuchqpqpqthatsuchqpxyyx??????,Wjjlij0,1ijniidGoogleIBMCleverHITS[12](Hyperlink-InducedTopicSearch)Clever12Clever1,0005,000CleverpxppypAgent25kClever10Clever50%YahooAltaVistaGoogleClever[13]3.3.1Agent26AgentAgentAgent{scene:{TaskNo:Resultinformation:{Resulttitle:ResultURL:Keywords:Author:};Relevancevalue:Systemrecommendsoperation:User’soperation:}Agent27}RelevancevalueIADWA3.3.2AgentAgent3.3.3AgentMASMASAgentMASMASAgentAgentAgentAgentAgentMASMAS{AgentAgent}AgentAgent28AgentAgentAgent3.4.1[14]12343.4.2AgentAgentAgent29Agent[15]NN1Agentmdi=(di1di2dijdim)dijijWij=Tijlog5.0?jnNTijdijdinjdijNW=edWW?WdWeWAgentAgentAgentAgentAgentAgent30AgentAgentAgentAgentAgentAgentAgentAgentAgent31CGIIPIPIPIP1IPIPIP0ID32IECOOKIEIPCOOKIEIPIPIDIP33NONOYESYES10CGIIP+134WebWebHeadBodyHeadWebBody1[16]TextCategorizationWebWebWeb2WebHeadWebWebHeadWeb3WebWebWebWebWeb4Web3Web353WebWebHead3WebWeb[17]URLWebStoplistStemPorter[27]vectorspacemodel[8]DmD={d1,didm}di,n36??inijiidddd,,,,1?????,dijdij1?ijd?????midjmidjii,......,1,,0,......,1,,1(2)?ijd???iiijdjdjt0(3)TFIDFdij=Tijlog(N/nj)TijTjdiNnjTj3URLVSMmAgentVSMt37informationgain[18]G(t)=???mi1Pr(Ci)logPr(Ci)+Pr(t)??mi1Pr(Ci|t)logPrCi|t+Pr(?t)??mi1Pr(Ci|?t)logPr(Ci|?t)Pr(Ci)CiPr(t)tPr(?t)?tP(Ci|t)tCiP(Ci|?t)tCiVCCGIHTML4.3IADWAIADWA38Agent1[20]generalization/specification398IADWA401elsettttdjkjkjkikc?????01),(Bkk2JAssigntonearest????mipyyP???1,1,??????dd?P??????MkkkckiPiiyBJ1c1c),d(d),d(d),(ddmin??3JeC1CP???||1),(d),(dVkjkikckjittBDD41jeejejejeejejtCCtDFCtPCtPCCtDFCtP)()(1)~(,)()(?????))~(),((max(ejejCtPCtPJe?(e=1P)4e?,j?de?PDNCPdocilenumberofclusterprocedureDCPdynamicClusteringprocedure1.DNCP,DDi?Simthcount,NthP1Input:Psimth,Lth(=simth)42NthBkk=1VOutput:Pe?e=1PBegin:P=1Clustercount[P]=1;S1CPForeachDsi?do//siForn-isjdo//???;))~(),(max(;)(),(||1VjjPPjjipjtPptPCssimthDifssdD?????ClusterCount[P]++;??;??P???doPtoijfordoPtoiforP111;??????43D=dc(ji??,);If(DLth)}}fori=1toPdo{if(ClusterCount[P]Nth){DeleteCluster(CP);P--;}}end.O(nm)nDmV2.DCP(dynamicclusteringprocedure)DiSPiSpC??;);,(??PCCmergerji44JeiSeCeCe?Input:D.Pe?e=1POutput:P}}k=ClusterID[si];??;))~(),(max(||1VjjPpjtPptP???;][;){();,(d){;;2(;1][);,(d{c1c????????????????????iiiiisclusterIDdistnmdismdisdistnifsdistnforsclusterIDsmdisDseachfor45}repeat//??move=0;foreachsi?D{mdis=dc(si,1?)k=1for(?=2?=P;?++){distn=dc(si,??)if(distnmdis)mdis=distn;k=?;}}ifClusterID(si)k{move=1;46q=ClusterID(si)//ClusterID(si)=k;k?q?}}untilmove=0OCnPnDn?Crepeat-untilC5247Step1[30]t([21,29])C,XCXC,XCt,XAlgorithmICLA:IncrementalConceptLatticeAlgorithmInput:ConceptLatticeL,documentfeaturetobeaddedX,48minimalsupportthresholdt.Output:UpdatedlatticeLandfrequentitemsetFISmark??;Foreachelem
本文标题:11国内外网络搜索引擎的发展现状
链接地址:https://www.777doc.com/doc-4232046 .html