您好,欢迎访问三七文档
CurrentApproachestoPunctuationinComputationalLinguisticsB.SayandV.AkmanDept.ofComputerEngineeringandInformationScience,BilkentUniversity,06533Bilkent,Ankara,Turkeyfsay,akmang@cs.bilkent.edu.trAbstract.Somerecentstudiesincomputationallinguisticshaveaimedtotakeadvantageofvariouscuespresentedbypunctuationmarks.Thisshortsurveyisintendedtosummarisetheseresearche ortsandadditionally,tooutlineacurrentperspectivefortheusageandfunctionsofpunctuationmarks.Weconcludebypre-sentinganinformation-basedframeworkforpunctuation,in uencedbytreatmentsofseveralrelatedphenomenaincomputationallinguistics.Keywords:discourse,information,naturallanguagesyntax,naturallanguagesemantics,punctuationAbbreviations:DRT{discourserepresentationtheory;DRS{discourserepre-sentationstructure;NLP{naturallanguageprocessing;NLG{naturallanguagegeneration;RST{rhetoricalstructuretheory;SDRT{segmenteddiscourserepre-sentationtheory;SDRS{segmenteddiscourserepresentationstructureThispaperhasnotbeensubmittedelsewhereinidenticalorsimilarform,norwillitbeduringthe rstthreemonthsafteritssubmissiontoComputersandtheHumanitieswithoutpermissionofKluwerAcademicPublishers.1.IntroductionPunctuationmarkshavenotbeenstudiedmuchbylinguistsapartfromaprescriptivestandpointuntiltheeighties.Similarly,mostnat-urallanguageprocessingsystemsdidnottakepunctuationmarksintoaccountexceptfortheperiodandthespacing.However,therehavebeenrecentworksinlinguistics(computational,corpus,andapplied),givingadescriptivetreatmentoftheroleofpunctuationincontem-porarywrittenlanguage.Furthermore,variousnaturallanguagepro-cessingsystemshavestartedtomakeuseofsyntacticcuesprovidedbypunctuationmarks.Inthisshortandbynomeansexhaustivesurvey,weintendtopresentthecurrentstateofincorporationofpunctuationmarksintonaturallanguageprocessing(NLP)systemsaswellassum-marisingtherecentresearch(computationalorwithingenerallinguis-tics)ondescriptivecharacterisationsofpunctuation.1Inthissurvey,2B.SayandV.Akmanwetakepunctuationmarkstobenotonlythestandardmarkssuchascomma,colon,period,dash,etc.butalsothemoregraphicaldevicessuchasparagraphs,tables,lists,featuresforemphasizing(suchasuseofitalics).Therestofthesurveyisorganisedasfollows.Section2givesacur-rentperspectiveonthehistoryofpunctuationanditsplaceinwritingtoday.InSection3,wepresentsomeofthecurrentlinguisticstudies,excludingthecomputationalones.InSection4,relevantNLPworkmostlyontherelationshipofsyntaxandpunctuationintheareaofcomputationallinguisticsissummarisedandevaluated.InSection5,semantic,intonationalanddiscourse-wiseimplicationsofpunctuationarediscussed.Section6concludeswithaninformation-basedperspec-tiveforpunctuation.2.PunctuationandWrittenLanguageAccordingtoParkes(1993),thedevelopmentofpunctuationtookplaceinseveralstagespairedupwiththedevelopmentofthewrittenmedium.Eachstage’sreadergrouprequireddi erentdemandstobesatis ed,thusa ectingthemarksandtheirfunctions.InClassicalLatinwriting,educationwasdirectedatpreparingstudentsfore ectivepublicspeak-ing(Parkes,1993,p.5).Authorsdictatedtheirwritingtothescribesandonlyforteachingpurposesdidanauthor,ascribe,oracorrectorputdi erentmarksonthemanuscriptforindicatingdi erentlengthofpauses.Spacesbetweenlexicalwordsdidnotbecomecustomaryuntilthetenthcentury(Levinson,1985,p.23).Asopposedtopunctuat-ingfororalreaders,somegrammarianssawwritingasameansforsilentlyconveyingmeaningtothereader(Parkes,1993,p.21).Duringtheeighthcentury,theIrishdevisednewgraphicconventionsinthewrittentext(becauseLatinwasmainlyawrittenorvisiblelanguageforthem)andlaterpassedthoseconventionsontotheAnglo-Saxons(Parkes,1993,p.23).From12thcenturyonwards,ageneralinventoryofpunctuationmarkswasdesignedbut,sinceeventwoscribescopyingthesamemanuscriptemployeddi erentmarks,therewasnostandard-isation(Parkes,1993,p.69).Whenwritingwentbeyondtheboundariesofthemonasteriesandtheclergy,andbegantobeusedforsecularpurposes,economyandspeedinreadingbecamemoreimportant(Levinson,1985,p.38).Writ-ersstartedtousepunctuationtobringouttherelationshipsbetweenthegrammaticalconstituentsofthesentence.Inparticular,during14thto16thcenturies,thehumanistswantedtheirtextstobepersuasiveanddemonstrative.Thus,theyadoptedalargersetofpunctuationmarkscompsingle.tex;13/01/1998;16:00;nov.;p.2CurrentApproachestoPunctuationinComputationalLinguistics3todisambiguatethelogicalstructureofsentences.Newmarkscorre-spondingtotoday’sparentheses,semicolonandexclamationmarkweredevisedinthe15thcentury.From16thcenturyonwards,withthewide-spreadusageofprintingagradualstandardisationemerged.Typesandfontswereprecutandsoldtoprinterssotheavailablerepertoryofmarkswasnolongerpersonalisedbythescribes.Also,beforeprint-ing,thedestinationofthemanuscriptbeingprepared(e.g.,aspeci cmonasteryorlibrary)wasmostlyknownbeforehand.Afterprintingbecamethenorm,thispre-existingconnectionbetweenthe\publish-erandtheclientwasbroken;therewasnowagreaterpressureforgeneralunderstandabilityandreadabilityofthetext.Theorthograph-icsentencebecamethefundamentalinformationunitpresentedtothereaderinaneasy-to-understandmanner(Levinson,1985,p.157).Sym-bolssuch
本文标题:Current Approaches to Punctuation in Computational
链接地址:https://www.777doc.com/doc-5595166 .html