您好,欢迎访问三七文档
IntroductiontoNaturalLanguageProcessing200611NLP2NLP3,NLP4NaturalLanguageProcessing,NLPNLP5(ChineseInformationProcessing)(NaturalLanguageUnderstanding)(ComputationalLinguistics)(HumanLanguageTechnology)NLP6NLP7NLP8NLP9GB2312-80:6763NNN=14%N=204898%N=2016.7%N=307299.7%N=3221%N=383899.9%N=30065%N=517799.99%N=60081%N=620999.993%NLP1021355400NLP11--5?“”“”“”NLP12NLP13NLP14……NLP15inflectinglanguageanalyticlanguageAllprofessorscamehere.EvenProfessorZhangcamehere.Editingisverydifficult.Howtobecomeagoodeditor?NLP16NLP17NLP18NLP19NLP20NLP205050-6070-8080NLP21NLPNLP22(Ambiguity)(Ill-Formedness)NLP23(le4)(yue4)NLP24“”NLP25(UnknownWords,Out-of-VocabularyWords)MycardrinksgasolinelikewaterNLP26NLP27—NLP28—NLP29—NLP30—NLP31NLPNLP32NLPNLPNLP--NLPNLP33NLPCDNLP34NLP(KnowledgeAcquisition)(HybridApproach)NLP35NLP36NLPNLPNLP372005.42005.1ChristopherD.Manning,HinrichSchütze,FoundationsofStatisticalNaturalLanguageProcessing,MITPress,19992005.6DanielJurafsky,JamesH.Martin,SpeechandLanguageProcessing:AnIntroductiontoNaturalLanguageProcessing,ComputationalLinguistics,andSpeechRecognition,PrenticeHallPress,20001993.121998.9NLP38—,NLP39ISO/IEC10646/UnicodeUnicode2003.8—2000.3Summary,Explanations,AndRemarks:GB18030-2000,fromUnicodeemail:dmeyer@adobe.com:NLP3NLP4(Probability))(APAΩ10)(≥AP21)(=ΩP3ij(ji≠),iAjA(Φ=∩jiAA),∑∝=∝==00)()(iiiiAPAPUNLP5(Maximizationlikelihoodestimation,MLE){1s,2s,…,ns},Nks(nk≤≤1))(kNsnksNsnsqkNkN)()(=∑==nkkNNsn1)(∑==nkkNsq11)(N)(kNsqks)(ksP)()(limkkNNsPsq=→∝NLP6200.0408550.0084700.0060120.0054060.0139940.0083560.0058570.0051720.0117580.0072970.0057200.0051170.0101750.0068210.0057050.0048240.0090340.0065570.0054880.004685NLP7(conditionalprobability)ABΩ0)(BPBA)|(BAP)()()|(BPBAPBAP∩=)()|(APBAP≠NLP8“”“”“”“”NLP9Ω1B,2B,…,nBΩ1B,2B,…,nBΩAΩ1B,2B,…,nBΩ0)(iBPni,,2,1L=∑∑=====∪=niniiiiiniBAPBPABPABPAP111)|()()()()(NLP10(Bayes’Theorem)AΩ1B,2B,…,nBΩ0)(AP0)(iBPni,,2,1L=∑==njjjiiiBAPBPBAPBPABP1)|()()|()()|(1=n)()()|()|(APBPBAPABP=NLP11(Priorprobability):(Posteriorprobability):NLP12AS)|(ASP)|(maxarg^ASPSS=)()|()(maxarg^APSAPSPSS=)(APA)|()(maxarg^SAPSPSS=)|(SAP)(SPNLP13100,000“”0.95“”0.005NLP14G“”T“”00001.01000001)(==GP99999.01000001100000)(=−=GP95.0)|(=GTP005.0)|(=GTP?)|(=TGP002.099999.0005.000001.095.000001.095.0)()|()()|()()|()|(≈×+××=+=GPGTPGPGTPGPGTPTGPNLP15(binomialdistribution)AAApnXAnnX,,1,0L=nrrnrrnrppCp−−=)1(!)!(!rrnnCrn−=nr≤≤0X),(~pnBXNLP16(Expectation)XkkpxXP==)(L,2,1=k∑∝=1kkkpxX∑∝==1)(kkkpxXENLP17(Variance)X)()()))((()(222XEXEXEXEXVar−=−=NLP181948Shannan“”NLP19NLP20(Entropy)XNLP21X)()(xXPxp==Xx∈X)(XH∑∈−=XxxpxpXH)(log)()(200log0=)(XH)(pHbitNLP22NLP23-70.426log)261log261(26)(log)()(222==−×=−=∑∈XxxpxpXHNLP24-4380234.1606NLP25E0.1268L0.0394P0.0186T0.0978D0.0389B0.0156A0.0788U0.0280V0.0102O0.0776C0.0268K0.0060I0.0707F0.0256X0.0016N0.0706M0.0244J0.0010S0.0634W0.0214Q0.0009R0.0594Y0.0202Z0.0006H0.0573G0.0187NLP26NLP276000123669.65NLP28(JointEntropy)XY),(yxpXY∑∑∈∈−=XxYyyxpyxpYXH),(log),(),(2NLP29(ConditionalEntropy)(X,Y)),(yxpXY∑∑∑∑∑∈∈∈∈∈−=−===XxYyXxYyXxxypyxpxypxypxpxXYHxpXYH)|(log),(])|(log)|()[()|()()|(XYNLP30)|()(),(XYHXHYXH+=),|()|()(),(111211−+++=nnnXXXHXXHXHXXHLLLNLP31)|()())|((log))((log))|(log)((log)))|()((log()),((log),(),()(),(),(),(XYHXHxypExpExypxpExypxpEyxpEYXHyxpxpyxpyxpyxp+=−−=+−=−=−=NLP32(MutualInformation)(X,Y)),(yxpX,Y)|()();(YXHXHYXI−=∑∑∈∈=XxYyypxpyxpyxpYXI)()(),(log),();(2);(YXIYXYXNLP33(MutualInformation)0)|(=XXH);()|()()(XXIXXHXHXH=−=NLP34)|(YXH)|(XYH);(YXI)(XH)(YH),(YXHNLP35(RelativeEntropyorKullback-LeiblerDivergence))(xp)(xq∑∈=XxxqxpxpqpD)()(log)()||(=∝=)0/log(,0)/0log(0ppq0)||(≥qpD0)||()||(pqDqpD≠NLP36(CrossEntropy)X)(xp)(xq)(xpXq∑−=+=xxqxpqpDXHqXH)(log)()||()(),(NLP37)(xp)(iXL=q∑→∝−=nxnnnxqxpnqLH1)(log)(1lim),(11nnxxx,,11L=L)(1nxpLnx1)(1nxqqnx1NLP38n1LstationaryergodicLq)(log1lim),(1nnxqnqLH→∝−=qLq)(xpNLP39(,,Perplexity)LnnlllL11=LnnlnqLHqlqPPn11)log(1),()]([221=≈=NLP40:I:O:)|()(maxarg)()|()(maxarg))|((maxargˆIOpIpOpIOpIpOIpIIII===)(Ipi1i2i3…ino1o2o3…onNLP200611NLP2NLP3(partsofspeech,)NLP4()widewidely,difficultdifficultly()collegedegree,overtake,madcowdiseaseNLP5(the,a)NLP6NLP7Iputthebagelsinthefreezer.Thebagels,Iputinthefreezer.Iputinthefridgethebagels(thatJohnhadgivenme)NLP8SheThewomanThetallwomanTheverytallwomanThetallwomanwithsadeyeshimthemantheshortmantheveryshortmantheshortmanwithredhairsawNLP9SNPVPThatmanVBDNPPPcaughtthebutterflyINNPwithanetNLP10(NP)ThehomelessoldmanintheparkthatItriedtohelpyesterdayNLP11(PP)Inthemorning,tothewest,atthesameplace,etc.NLP12(VP)Gettingtoschoolontimewasastruggle.Hewastryingtokeephistemper.Thatwomanquicklyshowedmethewaytohide.NLP13APSheisverysureofherself.Heseemedamanwhowasquitecertaintosucceed.NLP14‘S’NLP15()S→NPVPNP→ATNNS|ATNN|NPPPVP→VPPP|VBD|VBDNPPP→I
本文标题:自然语言处理课件
链接地址:https://www.777doc.com/doc-5349067 .html