手机浏览更便捷

您好，欢迎访问三七文档

当前位置：首页 > 商业/管理/HR > 市场营销 > Lecture 2---Language Modeling

Lecture 2---Language Modeling

举报
收藏

Prof.HuiJiangDepartmentofComputerScienceandEngineeringYorkUniversity,Toronto,Canadahj@cse.yorku.ca •—AcousticModel(AM):givestheprobabilityofgeneratingfeatureXwhenWisuttered.–NeedamodelforeveryWtomodelallspeechsignals(features)fromWHMMisanidealmodelforspeech–Speechunitselection:whatspeechunitismodeledbyeachHMM?(phoneme,syllable,word,phrase,sentence,etc.)•Sub-wordunitismoreflexible(better)•—LanguageModel(LM):givestheprobabilityofW(word,phrase,sentence)ischosentosay.–NeedaflexiblemodeltocalculatetheprobabilityforallkindsofWMarkovChainmodel(n-gram))|(WXpΛ)(WPΓ)|()(maxarg)|()(maxarg)|(maxargˆWXpWPWXpWPXWpΛΓΓ∈Γ∈Γ∈⋅=⋅== •TrainingStage:–Acousticmodeling:howtoselectspeechunitandestimateHMMsreliablyandefficientlyfromavailabletrainingdata.–Languagemodeling:howtoestimaten-grammodelfromtexttrainingdata;handledatasparsenessproblem.•TestStage:–Search:givenHMM’sandn-grammodel,howtoefficientlysearchfortheoptimalpathfromahugegrammarnetwork.•Searchspaceisextremelylarge•Callforanefficientpruningstrategy •N-gramLanguagemodel(LM)essentiallyisaMarkovChainmodel,whichiscomposedofasetofmultinomialdistributions.•GivenW=w1,w2,…,wM,LMprobabilityPr(W)isexpressedas–whereht=wt-n+1,…,wt-1ishistoryofwt.–Inunigram,ht=null(parameters~|V|,|V|vocabularysize)–Inbigram,ht=wt-1(parameters~|V|*|V|)–Intrigram,ht=wt-2wt-1(parameters~|V|*|V|*|V|)–In4-gram,ht=wt-3wt-2wt-1(parameters~|V|*|V|*|V|*|V|)•HowtoevaluateperformanceofLM?∏===MiiiMhwp)|(),,,Pr()Pr( ! •Perplexity:themostwidelyusedperformancemeasureforLM.•GivenanLM{Pr(.)}withvocabularysize|V|,andasufficientlylongtestwordsequenceW=w1,w2,…,wM:–Calculateanegativelog-probquantityperword:–PerplexityofLMiscomputedas•Perplexity:indicatesthepredictionoftheLMisaboutasdifficultasguessingawordamongPPequallylikelywords.•Perplexity:thesmallerPPvalue,thebetterLMpredictioncapability.•Training-setperplexity:howmuchLMfitsorexplainthedata•Test-setperplexity:generalizationcapabilityoftheLMtopredictnewtextdata.)Pr(log12WMLP-=LPPP2= •Largevocabularysizeexponentialgrowthofvariousn-gramsexponentialincreasementofLMmodelparametersmuchmoretrainingdataandcomputingresources•NeedtocontrolvocabularysizeinLM.•Giventhetrainingtextdata,–limitvocabularyofLMtothemostfrequentwordsoccurringinthetrainingcorpus,e.g.,thetopNwords.–Allotherwordsaremappedasunknownword,UNK.–Thisgivesthelowestrateofout-of-vocabulary(OOV)wordsforthesamevocabularysize.•Example:EnglishnewspaperWSJ(WallStreetJournal)–Trainingcorpus:37millionwords(full3-yeararchive)–Vocabulary:20,000words–OOVrate:4%–2-gramPP:114–3-gramPP:76 # $•Collecttextcorpus:needtensofmillionsofwordsfor3-gram•Corpuspreprocessing:(verytime-consuming)–Textclean-up:removepunctuationandothersymbols–Normalization:0.1%(zero)pointonepercent6:00sixo’clock;1/2onehalf,…–SurroundingeachsentencewithTAGSsand/s–Language-specificprocessing:e.g.,forsomeorientallanguages(Chinese,Japanese,etc.)dotokenizationfindwordboundariesfromastreamofcharacters.–Output:cleantextsw1w2w3w4w5/ssw11w21w32w41w52w12w22w33w44w54w16w26w36w43w56/ssw12w23w31w42w51w11w23w34w44w5/s…. # %•LMparameterestimationfromcleantext:–Theentiretrainingtextcanbemappedintoanorderedsampleofn-gramswithoutlossofinformation:S=h1w1,h2w2,…hTwT(assumewehaveTwordsintrainingcorpus)–Grouptogetheralln-gramswiththesamehistoryh:Sh=hwx1,hwx2,…,hwxn–Shcanbeviewedasani.i.d.samplefromPr(w|h).–Wedenotephw=p(w|h)forallpossiblew’sandh’s.–SoprobabilityofShfollowsamultinomialdistribution:whereN(hw)isfrequencyofn-gramhwoccurringinSh.∏∈∝VwhwNhhwpS)()]|([)Pr( # & •MaximumLikelihood(ML)estimationofmultinomialdistributioniseasytoderive.•TheMLestimateofn-gramLMis:)()()()(.allfor1constrantssubjecttoln)(maxarg)|(ln)(maxarg)()|(hNhwNhwNhwNphpphwNhwphwNVwMLhwVwhwVwhwpVwhwphw===⋅=⋅∈∈∈∈ # & •ThenaturalconjugatepriorofmultinomialdistributionistheDirichletdistribution.•ChooseDirichletdistributionaspriors–where{K(hw)}arehyper-parameterstospecifytheprior.•Deriveposteriorp.d.f.fromBayesianlearning:•Maximizationofposteriorip.d.f.theMAPestimate•MAPestimatesofn-gramLMcanbeusedforsmoothing.∏∈∝VwhwKhwhwppp)(][})({∏∈+∝VwhwNhwKhwhhwpSpp)()(][)|}({∈++=VwMAPhwhwKhwNhwKhwNp)]()([)()()(' •MLestimationneverworksduetodatasparseness.•Example:in1.2millionwordsEnglishtext(vocabulary1000words)–20%bigramsand60%trigramsoccuronlyonce.–85%oftrigramsoccurlessthanfivetimes.–Afterobservingthewhole1.2Mwdata,theexpectedchanceofseeinganewbi-gramis22%,anewtri-gram65%.•InMLestimation:zero-frequencyzeroprobability•Datasparsenessproblemcannotbesolvedbycollectingmoredata.–Extremelyunevendistributionofn-gramsinnaturallanguage.–Afteramountofdatareachesacertainpoint,thespeedofreducingOOVrateorrateofnewn-gramsbyaddingmoredatabecomesextremelyslow.•Callforabetterestimationstrate

整理文档很辛苦，赏杯茶钱您下走！

¥ 15 元

还剩... 页未读，继续阅读

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

举报
收藏 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

关键词：: lecture modeling

三七文档所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

Ta的文档更多...

关于本文

本文标题：Lecture 2---Language Modeling

链接地址：https://www.777doc.com/doc-3499467 .html

共141篇文档

文档简介：

格式： pdf

大小： 849 KB

时间： 2020-02-05

相关文档

相关搜索

lecture modeling

< / 30 >

下载文档

扫描二维码
访问手机网站

联系我们

邮箱：2149211541@qq.com

Q Q：2149211541

Copyright © 三七文档 All Rights Reserved. 鲁ICP备2024069028号-1

保存成功