您好,欢迎访问三七文档
RepresentingDiscourseCoherence:ACorpus-BasedStudyFlorianWolf∗UniversityofCambridgeEdwardGibson∗∗MassachusettsInstituteofTechnologyThisarticleaimstopresentasetofdiscoursestructurerelationsthatareeasytocodeandtodevelopcriteriaforanappropriatedatastructureforrepresentingtheserelations.Discoursestructureherereferstoinformationalrelationsthatholdbetweensentencesinadiscourse.ThesetofdiscourserelationsintroducedhereisbasedonHobbs(1985).Wepresentamethodforannotatingdiscoursecoherencestructuresthatweusedtomanuallyannotateadatabaseof135textsfromtheWallStreetJournalandtheAPNewswire.Alltextswereindependentlyannotatedbytwoannotators.Kappavaluesofgreaterthan0.8indicatedgoodinterannotatoragreement.Wefurthermorepresentevidencethattreesarenotadescriptivelyadequatedatastructureforrepresentingdiscoursestructure:Incoherencestructuresofnaturallyoccurringtexts,wefoundmanydifferentkindsofcrosseddependencies,aswellasmanynodeswithmultipleparents.Theclaimsaresupportedbystatisticalresultsfromourhand-annotateddatabaseof135texts.1.IntroductionAnimportantcomponentofnaturallanguagediscourseunderstandingandproductionishavingarepresentationofdiscoursestructure.Acoherentlystructureddiscoursehereisassumedtobeacollectionofsentencesthatareinsomerelationtoeachother.Thisarticleaimstopresentasetofdiscoursestructurerelationsthatareeasytocodeandtodevelopcriteriaforanappropriatedatastructureforrepresentingtheserelations.Therehavebeentwokindsofapproachestodefiningandrepresentingdiscoursestructureandcoherencerelations.Theseapproachesdifferwithrespecttowhatkindsofdiscoursestructuretheyareintendedtorepresent.Someaccountsaimtorepresenttheintentional-levelstructureofadiscourse;intheseaccounts,coherencerelationsreflecthowtheroleplayedbyonediscoursesegmentwithrespecttotheinterlocu-tors’intentionsrelatestotheroleplayedbyanothersegment(e.g.,GroszandSidner1986).Otheraccountsaimtorepresenttheinformationalstructureofadiscourse;intheseaccounts,coherencerelationsreflecthowthemeaningconveyedbyonediscoursesegmentrelatestothemeaningconveyedbyanotherdiscoursesegment(e.g.,Hobbs1985;Marcu2000;Webberetal.1999).Furthermore,accountsofdiscoursestructurevarygreatlywithrespecttohowmanydiscourserelationstheyassume,rangingfrom2(GroszandSidner1986)toover400differentcoherencerelations(reportedinHovyand∗ComputerLaboratoryandGeneticsDepartment,Cambridge,CB30FD,U.K.E-mail:Florian.Wolf@cl.cam.ac.uk∗∗DepartmentofBrainandCognitiveSciences,Cambridge,MA02139.E-mail:egibson@mit.edu.Submissionreceived:15thJune2004;Revisedsubmissionreceived:5thSeptember2004;Acceptedforpublication:23rdOctober2004©2005AssociationforComputationalLinguisticsComputationalLinguisticsVolume31,Number2Maier[1995]).However,HovyandMaier(1995)arguethat,atleastforinformational-levelaccounts,taxonomieswithmorerelationsrepresentsubtypesoftaxonomieswithfewerrelations.Thismeansthatdifferentinformational-level-basedtaxonomiescanbecompatiblewitheachother;theydifferwithrespecttohowdetailedorfine-grainedamannertheyrepresentinformationalstructuresoftexts.Goingbeyondthequestionofhowdifferentinformational-levelaccountscanbecompatiblewitheachother,MoserandMoore(1996)discussthecompatibilityofrhetoricalstructuretheory(RST)(MannandThompson1988)withthetheoryofGroszandSidner(1986).However,notethatMoserandMoore(1996)focusonthequestionofhowcompatibletheclaimsarethatMannandThompson(1988)andGroszandSidner(1986)makeaboutintentional-leveldiscoursestructure.Inthisarticle,weaimtodevelopaneasy-to-coderepresentationofinformationalrelationsthatholdbetweensentencesorothernonoverlappingsegmentsinadis-coursemonologue.Wedescribeanaccountwithasmallnumberofrelationsinordertoachievemoregeneralizablerepresentationsofdiscoursestructures;however,thenumberisnotsosmallthatinformationalstructuresthatweareinterestedinareobscured.Thegoaloftheresearchpresentedisnottoencodeintentionalrelationsintexts.Weconsiderannotatingintentionalrelationstoodifficulttoimplementinpracticeatthistime.Notethatwedonotclaimthatintentional-levelstructureofdiscourseisnotrelevanttoafullaccountofdiscoursecoherence;itjustisnotthefocusofthisarticle.Thenextsectiondescribesindetailthesetofcoherencerelationsweuse,whicharemostlybasedonHobbs(1985).Wetrytomakeasfewaprioritheoreticalassumptionsaboutrepresentationaldatastructuresaspossible.Theseassumptionsareoutlinedinthenextsection.Importantly,however,wedonotassumeatreedatastructuretorepresentdiscoursecoherencestructures.Infact,amajorresultofthisarticleisthattreesdonotseemadequatetorepresentdiscoursestructures.Thisarticleisorganizedasfollows.Section2describestheprocedureweusedtocollectadatabaseof135textsannotatedwithcoherencerelations.Section3describesindetailthedescriptionalinadequacyoftreestructuresforrepresentingdiscoursecoherence,andSection4providesstatisticalevidencefromourdatabasethatsupportsthisclaim.Section5offerssomeconcludingremarks.2.CollectingaDatabaseofTextsAnnotatedwithCoherenceRelationsThissectiondescribes(1)howwedefineddiscoursesegments,(2)whichcoherencerelationsweusedtoconnectdiscoursesegments,and(3)howtheannotationprocedureworked.2.1Di
本文标题:Representing discourse coherence A corpus-based an
链接地址:https://www.777doc.com/doc-3293147 .html