Running Title Searching the Web by Constrained Spr

SearchingtheWebbyConstrainedSpreadingActivationFabioCrestaniPuayLengLeeDepartmentofComputingScienceUniversityofGlasgowGlasgowG128QQ,ScotlandTel.+44-(0)141-3306292Fax.+44-(0)141-3304913Email:ffabio,leeplg@dcs.gla.ac.uk.RunningTitle:SearchingtheWebbyConstrainedSpreadingActivation.Keywords(fromACMComputingClassicationSystem):hyper-text/hypermedia,informationsearchandretrieval,spreadingactivation,queryformulation,intelligentagents.1AbstractIntelligentInformationRetrievalisconcernedwiththeapplicationofintelligenttechniques,likeforexamplesemanticnetworks,neuralnetworksandinferencenetstoInformationRetrieval.TheeldofresearchhasseenanumberofapplicationsofConstrainedSpreadingActivation(CSA)techniquesondomainknowledgenetworks.How-ever,therehasneverbeenanyapplicationofthesetechniquestotheWorldWideWeb.TheWebisaveryimportantinformationresource,butusersndthatlookingforarelevantpieceofinformationintheWebcanbelike\lookingforaneedleinahaystack.Wewerethere-foremotivatedtodesignanddevelopaprototypesystem,WebSCSA(WebSearchbyCSA),thatappliesaCSAtechniquetoretrieveinfor-mationfromtheWebusinganostensiveapproachtoqueryingsimilartoquery-by-example.Inthispaperwedescribethesystemanditsun-derlyingmodel.Furthermore,wereportonanexperimentcarriedoutwithhumansubjectstoevaluatetheeectivenessofWebSCSA.WetestedwhetherWebSCSAimprovesretrievalofrelevantinformationontopofWebsearchenginesresultsandhowwellWebSCSAservesasanagentbrowserfortheuser.Theresultsoftheexperimentsarepromising,andshowthatthereismuchpotentialforfurtherresearchontheuseofCSAtechniquestosearchtheWeb.21IntroductionThispaperisconcernedwiththeapplicationofConstrainedSpreadingAc-tivation(CSA)techniquesforretrievinginformationfromtheWorldWideWeb(herebyreferredtoastheWeb).TheWebpresentsaformidablestoreofinformation.Itisaninterconnectedsystemofover7millionsitesandtheirpages(inDecember1998)accessiblethroughbrowserslikeMosaic,NetscapeNavigatororMicrosoft’sInternetExplorer.AlthoughtheWebisoneoftheneweradditionstotheInternet,ithasgainedpopularityveryquickly,be-comingthesecondmostfrequently-usedfeatureoftheInternet,themostwidely-usedonebeingelectronicmail(Berners-Leeetal.,1992).TheinformationstoredintheWebdiersfromtheinformationtraditionallydealtbyInformationRetrieval(IR)systemsinseveralaspects.Informationorganization.TheWebisnotorganized,inthesensethatassociatedorsimilardocumentsarenotplacedinclosephysicalproximitylikethecollectionsinaphysicallibraryorstoredinsomearchive.Internetdirectories,likeYahoo!,helporganizelinkstosimilardocumentstoeasetheretrievalproblem,butthecategorizationprocessisoftendonemanuallyandthisisexpensiveandtime-consuming.SincetheWebisahypertext/hypermediasystemandwedonotpossesstheresourceswhichInternetdirectoriesdo,thenaturalwayofreachingsimilardocumentsfromgivendocumentswouldbetotraversethelinksonthelatter.AretrievaltoolfortheWebshouldexploitthelinksintheWebdocuments(i.e.Webpages)initssearchfordocumentsrelevanttoauserrequest.Informationrange.SomeconventionalIRsystemscontainspecial-izedinformation,suchas,forexample,medicaldocumentation,orpatents.Hence,IRsystemscansometimesexploitdomainknowledgetoenhanceretrievalperformance.Incontrast,thesubjectrangeofin-formationontheWebisverywide.AnyretrievalprogrambuiltfortheWebmustbeexibletoretrieveinformationofawiderangeofsubjectsandwrittenindierentnaturallanguages.Retrievalmodelsthatex-ploitassociationsbetweendocumentsareappropriateforretrievalontheWebbecausethesemodelsdonotdictatethetopicalrangeofrele-vantinformationprovidedatthebeginningofthesearch.Theysimply3searchforsimilarinformationregardlessofthetopicofthequery(Ellis,1996).Changeofcontent.TheWebisaverydynamicinformationcollec-tion.Everysecond,changesarebeingmadetoexistingWebpages,andpagesareaddedtoordeletedfromtheWeb.ConventionalIRsystemsarelessdynamicandthereismuchmorecontroloverthechangesmadetothedocumentcollection.AretrievalsystemfortheWebshouldbeabletoretrievedocumentsthatareup-to-dateandshouldnotrely(atleastnotcompletely)onindexesthatcouldbecomeoutdatedveryquickly.InthispaperwepresentaprototypeWebsearchsystemthatexploitstheabovedistinctionbetweendocumentsusuallymanagedbyIRsystemsandthosemanagedbytheWeb.TheunderlyingIRmodelofthisprototypeisavariationofthemodelknownasAssociativeRetrieval.AssociativeRetrievalwasrstintroducedbySalton(1968)andisconcernedwithexploitingasso-ciationsbetweeninformationitemsatretrievaltime.Associationsarerstdeterminedusingcitationsorstatisticaltechniques(likeforexampletermco-occurrence)andthenusedbycomplexretrievalfunctions.Intheworkpresentedinthispaperwedonotusecitationsorstatisticalassociations,butweusetheexistingassociationsrepresentedbyhypertextlinksbetweenWebdocuments.WhatweconsiderimportantofAssociativeRetrievalistheideabehindthisformofretrieval,i.e.thatitispossibletoretrieverelevantdocumentsbyretrievingthosethatareexplicitlyassociatedwithsomethattheuserknowstoberelevant.TheworkpresentedinthispaperintegratesAssociativeRetrievalwithOs-tensiveRetrieval.ThisnovelapproachtoIRwasproposedbyCampbellandVanRijsbergen(1996)andisconcerned

Running Title Searching the Web by Constrained Spr

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

模板工程实训doc-砖工、抹灰工实训大纲

白酒行业基础知识及行业常识

GIStar产品说明书

国内外热稳定剂现状及发展

范文范例：公司职能工资管理制度

011“钢筋算量v71”的核心价值和主要操作流程、案例

橱柜产设计失误案例(渠道分公司)

变压器制作实习成绩考核评分细则

米其林_轮胎_鉴定培训资料

人力资源透视

相关文档

相关搜索