您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 质量控制/管理 > Parameter estimation for text analysis
ParameterestimationfortextanalysisGregorHeinrichTechnicalNotevsonixGmbH+UniversityofLeipzig,Germanygregor@vsonix.comAbstract.Presentsparameterestimationmethodscommonwithdiscreteproba-bilitydistributions,whichisofparticularinterestintextmodeling.Startingwithmaximumlikelihood,aposterioriandBayesianestimation,centralconceptslikeconjugatedistributionsandBayesiannetworksarereviewed.Asanapplication,themodeloflatentDirichletallocation(LDA)isexplainedindetailwithafullderivationofanapproximateinferencealgorithmbasedonGibbssampling,in-cludingadiscussionofDirichlethyperparameterestimation.History:version1:May2005,version2.4:August2008.1IntroductionThistechnicalnoteisintendedtoreviewthefoundationsofBayesianparameteresti-mationinthediscretedomain,whichisnecessarytounderstandtheinnerworkingsoftopic-basedtextanalysisapproacheslikeprobabilisticlatentsemanticanalysis(PLSA)[Hofm99],latentDirichletallocation(LDA)[BNJ02]andothermixturemodelsofcountdata.Despitetheirgeneralacceptanceintheresearchcommunity,itappearsthatthereisnocommonbookorintroductorypaperthatfillsthisrole:MostknowntextsuseexamplesfromtheGaussiandomain,whereformulationsappeartoberatherdierent.Otherverygoodintroductoryworkontopicmodels(e.g.,[StGr07])skipsdetailsofalgorithmsandotherbackgroundforclarityofpresentation.Wethereforewillsystematicallyintroducethebasicconceptsofparameterestima-tionwithacoupleofsimpleexamplesonbinarydatainSection2.Wethenwillin-troducetheconceptofconjugacyalongwithareviewofthemostcommonprobabilitydistributionsneededinthetextdomaininSection3.Thejointpresentationofconjugacywithassociatedreal-worldconjugatepairsdirectlyjustifiesthechoiceofdistributionsintroduced.Section4willintroduceBayesiannetworksasagraphicallanguagetode-scribesystemsviatheirprobabilisticmodels.Withthesebasicconcepts,wepresenttheideaoflatentDirichletallocation(LDA)inSection5,aflexiblemodeltoestimatethepropertiesoftext.OntheexampleofLDA,theusageofGibbssamplingisshownasastraight-forwardmeansofapproximateinferenceinBayesiannetworks.TwootherimportantaspectsofLDAarediscussedafterwards:InSection6,theinfluenceofLDAhyperparametersisdiscussedandanestimationmethodproposed,andinSection7,methodsarepresentedtoanalyseLDAmodelsforqueryingandevaluation.22ParameterestimationapproachesWefacetwoinferenceproblems,(1)toestimatevaluesforasetofdistributionparam-eters#thatcanbestexplainasetofobservationsXand(2)tocalculatetheprobabilityofnewobservations˜xgivenpreviousobservations,i.e.,tofindp(˜xjX).Wewillrefertotheformerproblemastheestimationproblemandtothelatterasthepredictionorregressionproblem.ThedatasetX,fxigjXji=1canbeconsideredasequenceofindependentandidenti-callydistributed(i.i.d.)realisationsofarandomvariable(r.v.)X.Theparameters#aredependentonthedistributionsconsidered,e.g.,foraGaussian,#=f;2g.Forthesedataandparameters,acoupleofprobabilityfunctionsareubiquitousinBayesianstatistics.TheyarebestintroducedaspartsofBayes’rule,whichis1:p(#jX)=p(Xj#)p(#)p(X);(1)andwedefinethecorrespondingterminology:posterior=likelihoodpriorevidence:(2)Inthenextparagraphs,wewillshowdierentestimationmethodsthatstartfromsimplemaximisationofthelikelihood,thenshowhowpriorbeliefonparameterscanbeincor-poratedbymaximisingtheposteriorandfinallyuseBayes’ruletoinferacompleteposteriordistribution.2.1MaximumlikelihoodestimationMaximumlikelihood(ML)estimationtriestofindparametersthatmaximisethelikeli-hood,L(#jX),p(Xj#)=\x2XfX=xj#g=Yx2Xp(xj#);(3)i.e.,theprobabilityofthejointeventthatXgeneratesthedataX.BecauseoftheproductinEq.3,itisoftensimplertousetheloglikelihood,L,logL.TheMLestimationproblemthencanbewrittenas:ˆ#ML=argmax#L(#jX)=argmax#Xx2Xlogp(xj#):(4)Thecommonwaytoobtaintheparameterestimatesistosolvethesystem:@L(#jX)@#k!=08#k2#:(5)1Derivation:p(#jX)p(X)=p(X;#)=p(Xj#)p(#).3Theprobabilityofanewobservation˜xgiventhedataXcannowbefoundusingtheapproximation2:p(˜xjX)=Z#2p(˜xj#)p(#jX)d#(6)Z#2p(˜xjˆ#ML)p(#jX)d#=p(˜xjˆ#ML);(7)thatis,thenextsampleisanticipatedtobedistributedwiththeestimatedparametersˆ#ML.Asanexample,considerasetCofNBernoulliexperimentswithunknownparam-eterp,e.g.,realisedbytossingadeformedcoin.TheBernoullidensityfunctionforther.v.Cforoneexperimentis:p(C=cjp)=pc(1 p)1 c,Bern(cjp)(8)wherewedefinec=1forheadsandc=0fortails3.BuildinganMLestimatorfortheparameterpcanbedonebyexpressingthe(log)likelihoodasafunctionofthedata:L=logNYi=1p(C=cijp)=NXi=1logp(C=cijp)(9)=n(1)logp(C=1jp)+n(0)logp(C=0jp)=n(1)logp+n(0)log(1 p)(10)wheren(c)isthenumberoftimesaBernoulliexperimentyieldedeventc.Dierentiatingwithrespectto(w.r.t.)theparameterpyields:@L@p=n(1)p n(0)1 p!=0,ˆpML=n(1)n(1)+n(0)=n(1)N;(11)whichissimplytheratioofheadsresultstothetotalnumberofsamples.Toputsomenumbersintotheexample,wecouldimaginethatourcoinisstronglydeformed,andafter20trials,wehaven(1)=12timesheadsandn(0)=8timestails.ThisresultsinanMLestimationofofˆpML=12=20=0:6.2.2MaximumaposterioriestimationMaximumaposteriori(MAP)estimationisverysimilartoMLestimationbutallowstoincludesomeaprioribeliefontheparametersbyweightingthemwithapriordis-tributionp(#).Thenamederivesfromtheobj
本文标题:Parameter estimation for text analysis
链接地址:https://www.777doc.com/doc-5085516 .html