您好,欢迎访问三七文档
101965InternationalConferenceonComputationalLinguisticsNEASURENENTOFSI~IILARITYI~ETWI!ENNOUNSKennethE.llarperTileIOiNDCorporation1700MainStreetSantablonica,California9041)6AJ;STt',A(?TAstudywasr~adeoftiledegreeofsimilaritybetweenpairsofRussiannouns,asexpressedbytheirtendencytooccurinsentenceswithidentical~,,ordsinidenticalsyntacticrelationships.Asimilaritymatrixwaspreparedforfortynouns;foreachpairofnounsthenumberofshared(i)adjectivedependents,(ii)noundependents,and(iii)noungovernorswasautomaticallyretrievedfrommachine-processedtext.Thesimilaritycoefficientforeachpair~;asdeterminedastheratioofthetotalofsuchshared~'ordstotheproductofthefrequenciesofthetwonounsinthetext.The78~pairswererankedaccordingtothiscoefficient.Thetextcomprised12(1,~00runningwordsofphysicstextprocessedatTheRANDCorporation;thefrequenciesofoccurrenceofthefortynounsinthistextrangedfrom42to328.Theresultssuggestthatthesampleoftextisofsufficientsizetobeusefulfortheintendedpurpose.Manynounpairswithsimilarproperties(synonymy,antonym),,derivationfromdistributionallysimilarverbs,etc.)arecharacterizedbyhighsimilaritycoefficients;theconverseisnotobserved.Therelevanceofvarioussyntacticrela-tionshipsascriteriaformeas~rementisdiscussed.[larper1MEASURENIiNTOFSIMILARITYBETWEENNOUNSI.INTRODUCTIONOneofthegoalsofstudiesinDistributionalSemanticsistheestablishmentofwordclassesonthebasisoftheobservedbehaviorofwordsinwrittentexts.Aconvenientandsignificantwayofdiscussingbehaviorofwordsisintermsofsyntacticrelationship.Attheoutset,infact,itisnecessarythatwetreatawordintermsofitsSyntacticallyRelatedWords(SRW).Inagiventext,eachwordbearsagivensyntacticrelationshiptoafinitenum-berofotherwords;e.g.,afinitenumberofwords(nounsandpronouns)appearassubjectforeachactiveverb;anothergroupofnounsandpronounsareusedasdirectobjectofeachtransitiveverb;otherwordsoftheclass,adverb,appearasmodifiersofagivenverb.IneachinstancewemayspeakoftherelatedwordsasSRWofagivenverb,sothatinourexamplethreedifferent~ofSRWemerge;agivenSRWisthendefinedintermsbothofwordclassandspecificrelationshiptotheverb.(AgivennounmayofcoursebelongtotwodifferenttypesofSRW,e.g.,asbothsubjectandobjectofthesameverb.)Distributionally,wemaycomparetwoverbsintermsoftheirSRN.TheobjectiveofthepresentstudyistotestthepremisethatsimilarwordstendtohavethesameSRW.Thispremiseistested,notwithverbs,asinthel,arperaboveexample,butwithnouns.Ourprocedureis(i)tofindinagiventextthreetypesofSRWforasmallgroupofnouns,(2)tofindthenumberofSill;Tsharedbyeachpairofnounsformedfromthegroup,and(3)toexpressthesimilaritybetweenindividualnouns)andgroupsofnouns,asafunctionoftheirsharedSRI~.Anotherexample:itmightturnoutthatinagiventextthenounsaandb(avocadoandcherry)sharesuchadjectivemodifiersasripe,whereasnounsc)'andd(chairandfurniture)haveincommontheadjectivemodifiermodern.Thesefactswouldleadustoconcludethataandbaresimi-lar,thatcanddaresimilar)thataandcarelesssimilar,etc.Anumberofquestionsarise:Whatissimilarityanyway?DowordsthataresimilarinmeaningreallyshareasignificantnumberofSRWinagiventext?Whatisasignificantnumber?DonotdissimilarwordsalsohavemanycommonSRW?flowmuchtextisnecessaryinordertoestab-lishpatternsofwordbehavior?Whatistheeffectofmultiple=meaninginwords,andofusing,textsfromdiffer=entsubjectareas?Thepresentinvestigationshouldberegardedasanexperimentdesignedtothrowsomelightonthesequestions;novalidityisclaimedfortheresultsobtained.Ouraudacityinattemptingtheexperimentatallisbasedonthreefactors:thepossessionofatextinalimitedfield(physics),theforeknowledgethatthemultiple=llarper3meaningprobler:lismininlal,andthecapabilityforautomaticprocessingoftext.(Thelatterisclearlyanecessity,inviewo£thesizeandcomplexityoftheproblem.)Thereadermaywellconcludethattheexperimentprovesnothing.Wewouldhope,however,thatsuchanopinionwouldnotprecludeacriticaljudgmentoftheproceduresemployed,orthesuspensionofdisbeliefiftheresultsdonotcorrespondwithhisexpectations.2.PROCIiDIIRI']TilepresentstudywasbasedonaseriesofarticlesfromRussianphysicsjournals,comprisingapproximately120)000runningwords(some500pages).Theprocessinp,ofthiste.xthasbeendescribedelsewhere,(1'2)ltere,wenoteonlythateachsentenceofthistextisrecordedonmagnetictape,togetherwiththefollowinginformationforeachoccurrenceinthesentence:itspartofspeech,itswordnumber(anidentificationnumberinthemachineglossary},anditssyntacticgovernorordependent(i£any)inthesentence.AretrievalprogramappliedtothistexttapethenyieldedinformationabouttheSRI'iforwordsinwhichwewereinterested.Forconvenienceandeconomy,allwordsinthemachineprintoutforthisstudyareidentifiedbywordnumber,ratherthanintheirnatural-language£orv).Inourstudywechosetodealwit]~theSRI~offortyRussiannouns,hereincalledTest~ords{TW).Thenumberltarper4iscompletelyarbitrary;tileparticularnounschosen(seeTable1)a'erepresumedtoformdifferentsemanticgroupings.Table1givesonepossiblegroupingofthesewords;thecriteriaforgroupingaremoreorlessobvious,althoughthereadermayeasilyformdifferentgroups,byexpandingorcontractingthegroupsthatwehavedesignated.Theonlypurposeofgroupingistoprovideaweakmeasureofc
本文标题:10 1965 International Conference on Computational
链接地址:https://www.777doc.com/doc-6395374 .html