您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 薪酬管理 > 语料库常用术语解释 (1)
热烈欢迎来自全国各地的老师们!语料库语言学常用术语FrankLiangCorpus(语料库,尸体):–(pl.corporaorcorpuses):acollectionoftext,nowusuallyinmachine-readableformandcompiledtoberepresentativeofaparticularkindoflanguageandoftenprovidedwithsomekindofannotation(标注).–按照一定的采样标准采集而来的、能代表一种语言或者某语言的一种变体或文类的电子文本集。语料库语言学常用术语FrankLiangCorpusLinguistics语料库语言学立足于大量真实的语言数据,主要通过概率统计方法,对语料库做系统而穷尽的观察和概括得出结论。从本质上来讲,是实证性的(empirical).为何要建立语料库?为何要用语料库方法研究语言并将其运用于语言学习?FrankLiangExample:Startorbegin?在口语中哪个更常用?我们的老师经常说Let’sbegin!之类的话,对吗?FrankLiang但有人在BNC等语料库中查到,在口语中,start更常用。语料库的方法基于真实的语言使用情况,事实胜于雄辩FrankLiangAcorpuscanbeanalyzedusingsoftwaretools,muchlikethoseusedtofindkeywordsontheInternet,butwithgreatersophistication.Byevaluatingtheresultsofthesesearches,itispossibletoseehowlanguageisreallyused,andtofindanswerstoquestionslikethese:WhatarethemostfrequentwordsandphrasesinEnglish?Whichtensesdopeopleusemostoften?Whatprepositionsfollowparticularverbs?Howdopeopleusewordslikecan,may,andmight?Howmanywordsmustalearnerknowinordertoparticipateineverydayconversation?Materialsdevelopedwithacorpuscanthereforebemoreauthenticandcanillustratelanguageasitisreallyused.FrankLiangTypesofcorporaAnnotated标注corpus:acorpusenhancedwithvarioustypesoflinguisticinformation(ortaggedcorpus).Anannotatedcorpusmaybeconsideredtobearepositoryoflinguisticinformation,becausetheinformationwhichwasimplicitintheplaintexthasbeenmadeexplicitthroughconcreteannotation(“addedvalue附加值”).Comparable(reference参照)corpus:acorpususedforcomparisonofdifferent(typesof)languages.Comparablecorporaoftenfollowthesamecompositionpattern.Ifcomparablecorporaareannotated,annotationschemesforthecorporaareoftensimilar.语料库语言学常用术语FrankLiangMonolingual单语corpus:acorpuswhichcontainstextsinasinglelanguage.Multilingual多语corpus:acorpuswhichrepresentssmallcollectionsofindividualmonolingualcorpora(orsubcorpora)inthesensethattheyusethesameorsimilarsamplingproceduresandcategoriesforeachlanguagebutcontaincompletelydifferenttextsinthoseseverallanguages.Parallel平行(aligned)corpus:amultilingualcorpuswheretextsinonelanguageandtheirtranslationsintootherlanguagesarealigned,sentencebysentence,preferablyphrasebyphrase.语料库语言学常用术语FrankLiangSpecialcorpus专用语料库:Atypeofcorporathatareassembledforaspecificpurpose,andtheyvaryinsizeandcompositionaccordingtotheirpurpose.Specialcorporaarenotbalanced(exceptwithinthescopeoftheirgivenpurpose)and,ifusedforotherpurposes,giveadistortedviewofthelanguagesegment.Theirmainadvantageisthatthetextscanbeselectedinsuchawaythatthephenomenaoneislookingforoccurmuchmorefrequentlyinspecialcorporathaninabalancedcorpus.Acorpusthatisenrichedinsuchawaycanbemuchsmallerthanabalancedcorpusprovidingthesamedata.Generalcorpus通用语料库:语料库语言学常用术语FrankLiangToken形符:anindividualwordType类符:wordform.指不重复计算的形符数。Iseeacatandadogcontainsseventokensbutonlysixtypes(thetype'a'occurstwice).ThesentenceRoseisaroseisaroseisarose.waswrittenbyGertrudeSteinaspartofthe1913poemSacredEmily.语料库语言学常用术语FrankLiangtype/tokenratio(TTR)类符/形符比,形次比Rose句的TTR:4/10*100=40TTR是衡量文本中词汇密度的常用方法。可辅助说明文本的词汇难度。但是,文本中有大量功能词(functionwords,如the、a、of等)反复出现,文本每增加一个词,形符就会增加一个,但类符却未必随之增加。这样文本越长,功能词重复次数越多,TTR会越低。因此用TTR衡量词汇密度就不合理。语料库语言学常用术语FrankLiangstandardizedtype/tokenratio标准化类符/形符比例如,计算每个文本每1000词的TTR,均值处理,得出STTR语料库语言学常用术语FrankLiangFrequencies/occurences(频数,出现次数)Frequency(频率)例如每一百万词、十万词中,某单词的出现次数常常将某个单词在两个语料库中出现的频率参照两个语料库的容量,用卡方检验或对数似然率进行对比,来确定两个语料库中的该单词的使用上是否有差异语料库语言学常用术语FrankLiangKeywords关键词–Keywordsarewordswhosenormalizedfrequencyinonecorpus(observedcorpus)issignificantlyhigherorlowerthanthatinanothercomparablecorpus(referencecorpus).–Positivekeywordsandnegativekeywords语料库语言学常用术语FrankLiangConcordance索引(又称“语境中的关键词,KeyWordInContext,KWIC”)指的是运用索引软件在语料库中查询某词或短语的使用实例,然后将所有符合条件的语言使用实例及其语境以清单的形式列出Atermthatsignifiesalistofaparticularwordorsequenceofwordsinacontext.Theconcordanceisatthecentreofcorpuslinguistics,becauseitgivesaccesstomanyimportantlanguagepatternsintexts.ConcordancesofmajorworkssuchastheBibleandShakespearehavebeenavailableformanyyears.Thecomputerhasmadeconcordanceseasytocompile.(concordancer索引软件,concordancelines索引行)Thecomputer-generatedconcordancescanbeveryflexible;thecontextofawordcanbeselectedonvariouscriteria(forexamplecountingthewordsoneitherside).Interpretingconcordancelinescanbeademandingtask.AntConc、WordSmithTools等检索软件语料库语言学常用术语FrankLiangConcordance1instructedShirlWintertocomposeanoteofthankstobepostedonthecallboard.Bakew2andturnedawaywithoutasmileorawordofthanks.Usuallyshemarkedthefewwhodidth3startedoutthatway.Andhehadafeeling-thankstothegirl-thatthingswouldgetwor4laugh.GuessIcan'tthinkofanyone,Pete.Thanksanyhow.Afaintcreaseappearedbetwe5son,hisfaceworried.Scottymurmured,No,thanks,sosoftlyhisfatherhadtobendhis6givesthekindofthankswhicharemorethanthanks:tothemwearegratefulbeyondthepo7ce.Toall,theFoundationgivesthekindofthankswhicharemorethanthanks:tothemwe8edbythetwo-stepprepolymermethod,today,thankstonewcatalysts,theycanbeproduced9singit,Ishouldliketorecordonevoteofthankstothemfortheclaritywithwhichthe10ingather,Ismiledableakonewhichsaid,Thanks,baby,butI
本文标题:语料库常用术语解释 (1)
链接地址:https://www.777doc.com/doc-4128944 .html