您好,欢迎访问三七文档
崔 刚,盛永梅(清华大学 外语系,北京 100084) :语料标注是实现原始语料机读化的关键环节,也是语料库语言学领域的一个重要研究课题。本文结合国内外的有关研究成果以及国外的部分大型英语语料库的标注实践,介绍与讨论了语料标注的原则、模式以及类型,以供国内在建设英语语料库的过程中借鉴。:语料库;语料;标注:H087 :B :1000-0062(2000)01-0089-06、 ,,,(McEnery&Wilson,1996),、。,。,,。,,,。,、、。、Leech,(1993):1.,。,,。,,。,,。2.,。,,。,。3.。,,,,。4.,。,:1999-9-5: (1966- ),,,;(1976- ),,. 2000115()JOURNALOFTSINGHUAUNIVERSITY(PhilosophyandSocialSciences) No.1 2000Vol.15。,,,,,,。5.,,。,,,,,。6.。,,,。,,。7.。,。,。,,,,,,。,,。,。:“,,。,。”、,,,。,。COCOA,OCP(OxfordConcordancePro-gram),“-”(Longman-Lancastercorpus)、“”。COCOA:,,A“”,,,SHAKESPEAR,。,ASHAKESPEAR。,COCOA,、、,。TEI(TextEncodingInitia-tive)(McEnery&Wilson,1996)。“”(TheBritishNationalCorpus)TEI。TEI(ACL,AssociationforComputa-tionalLinguistics)、(ALLC,AssociationforLiteraryandLinguisticComputing)(ACH,As-sociationforComputersandHumanities)。TEISGML(Stan-dardGeneralizedMarkupLanguage),,。TEI,(header)。,、、、、,。TEI(tags)(entityrefer-ences)。,、、,,。,(starttag)...,,(endtag),,/...。,p,/p。,(FSD,featuresystemdeclaration),&、;—。,vvd,v,v(lexicalverb),,d90(),,containedcontained&vvd、contained_vvdcontained;vvd。(DTD,documenttypedescription)。DTD、,。TEI、、。,,DTD、。DTDSGML,TEI。、、、、、、。1.,qcea.tagQCE(tag)A。,。、(、、)、、、(,)、(,、、)、(、)。TEI,。,80,。,COCOA,TEI。TEI(McEnery&Wilson,1996:32):例1.TEIHEADERFILEDESCTI-TLESTMTTITLELivesoftheSaintsfromtheBookofLismore:anelectronicedition/TITLEAUTHORAnonymous/AUTHORRE-SPSTMTRESPcompliedby/RESPNAMEElvaJohnston/NAME/RESPSTMT/TITLESTMTEDITIONSTMTEDI-TIONN=”1”FirstDraft,Revisedandcorrected.DATE1993-04-30/DATE/EDI-TIONRESPTMTRESPProofcorrectionby/RESPNAMEDrNicoleMeller/NAME。,,TEI,LivesoftheSaintsfromtheBookofLismore:anelectronicedition,,ElvaJohnston。1993430,NicoleMeller。2.。,,,,。,。,。COBUILD:BE BeBED BewereBEDZ BewasBEG BeINGBEM BeamBEN BebeenBER BeareCC CD CS DEM DO DoDOD DoDOZ DoDT DTG DTP EX ThereHV HaveHVD HaveHVG HaveINGHVNHave91 HVZ HaveINJJMDNEGnotNNNNSNPPNPPLPPLSPPOPPPPPSRBTOUH(yes,ugh,um)VBVBDVBGINGVBNVBZWHWH3.,,had,has,havinghave。,。,,。(Beale,1987),GeoffreySampsonSU-SANNE,,:例2.N12:0510g _ PPHSlm He heN12:0510h_VVDvstudiedstudyN12:0510I_ATthetheN12:0510j_NN1cproblemproblemN12:0510k_IFforforN12:0510m_DD22laaN12:0510n_DD222fewfewN12:0510p_NNT2secondssecondN12:0520a _ CC and andN12:0520b_VVDvthoughtthinkN12:0520c_IOofofN12:0520d_AT1aaN12:0520e_NNcmeansmeansN12:0520f_IIbbybyN12:0520g_DDQrwhichwhichN12:0520h_PPH1ititN12:0520i_VMdmightmayN12:0520j_VB0bebeN12:0520k_VVNtsolvedsolveN12:0520m_+._.4.,,。。,(BNC)、-(Lancaster-Leeds)(SpokenEnglishCorpus),。,Claudiasatonastool.(S=,NP=,VP=,PP=,N=,V=,P=,AT=):例3.,,(BNC)():例4.[S[NPClaudia NP1NP][VPsat VVD[PPon II[NPa AT1stool NN1NP]PP]VP]S]92(),,。(fullparsing)(skeletonparsing)。,。,5-(Lancaster-Leeds),6(SpokenEnglishCorpus):例5.[S[Ncsanother DTnew JJstyle NNfea-ture NNNcs][Vzbis BEZVzb][Nsthe ATI[NN/JJ&wine-glass NN[JJ+or CCflared JJJJ+]NN/JJ&]heel NN` '[Fr[Nqwhich WDTNq][Vzpwas BEDZshown VBNVzp][Tn[Vnteamed VBNVn][Rup RPR][Pwith INW[NP[JJ/JJ/NN&pointed JJ` '[JJ squared JJJJ ]` '{NN+and CCchisel NNNN+}JJ/JJ/NN&]toes NNSNp〗P]Tn]Fr]Ns]` 'S]例6.[S&[PFor IF[Nthe ATmemebers NN2[Pof IO[Nthis DD1university NNL1N]P]N]P}[Nthis DD1charter NN1N][Venshrines VVZ[a AT1victorious JJprinciple NN1N]V]S&]; ;and CC[S+[Nthe ATfruits NN2[Pof IO[Nthat DD1victory NN1N]P]N][Vcan VMimmediately RRbe VB0seen VVN[Pin II[Nthe ATinternational JJcommunity NNJ[Pof IO[Nscholars NN2N]P][Frthat CST[Vhas VHZgraduated VVNhere RLtoday RTV]Fr]N}P}V]S+]` '56,5,6,。,6N,5。5.。,。,。,LongmanDictionaryofContemporaryEnglish(Janssen,1990)“”(fieldcode)。KlausSchmidt。Wilson(McEnery&Wilson,1996),Wilson(00000000-;13010000-;21030000-;21072000-;21110321-;21110400-;23241000-;312411000-):例7.And 00000000the00000000soldiers23241000platted21072000a00000000crown21110400of00000000thorns13010000and00000000put21072000it00000000on00000000his00000000head21030000and00000000they00000000put21072000on00000000him 00000000a00000000purple31241100robe211103217,WilsonSchmidt(1993),,93 ,1、2、3,1“”,2“”,3“”,,,crown,211104,2“”,1,“”,1,“”,4,“”。6.,。Stenstrom(1984)“-”(Lon-don-LundCorpusofSpokenEnglish)。,16,,(sorry,excuseme)、(kindof,sortof)、(hello,goodmorning)、(please)。。,HallidayHasan(1976)《》。“-/”(Lan-caster-Oslo/BergenCorpus)。6,,,,,,。、 、、,,,(1998),,。,,。,,,。:[1]Beale,A.Towardsadistributionallexicon,inGarside,R.,Leech,G.&Sampson(eds)TheComputationalAnalysisofEnglish:ACorpusBasedApproach.Long-man.1987.[2]Halliday,M.&Hasan,R.CohesioninEnglish,Long-man.1976.[3]Janssen,S.Automaticsense-disambiguationwithLDOCE:enrichingsyntacticallyanalyzedcorprawithse-manticdata,inAarts,J.&Meijs(eds)TheoryandPraccticeinCorpusLinguistics,Rodopi.1990.[4]Leech,G.Corpusannotationschemes,LiteraryandLin-guisticComputing.1993,8(4):275-469.[5]McEnery,T.&Wilson,A,CorpusLinguistics,Edin-burghUniversityPress.1996.[6]Schmidt,K.M.BegriffsglossarundIndexzuUlrichsvonZatzikhovenLanzelet,Niemeyer.1993.[7]Stenstrom,A.B.Discoursetags,inAarts,J.&Meijs(eds)TheoryandPraccticeinCorpusLinguistics,Rodopi.1984.[8].[J].《》,1998(3):17-28.[9].[J].《》,1998,(3):4-12.( )94()
本文标题:语料库中语料的标注
链接地址:https://www.777doc.com/doc-4933098 .html