Parameter estimation for text analysis

ParameterestimationfortextanalysisGregorHeinrichTechnicalNotevsonixGmbH+UniversityofLeipzig,Germanygregor@vsonix.comAbstract.Presentsparameterestimationmethodscommonwithdiscreteproba-bilitydistributions,whichisofparticularinterestintextmodeling.Startingwithmaximumlikelihood,aposterioriandBayesianestimation,centralconceptslikeconjugatedistributionsandBayesiannetworksarereviewed.Asanapplication,themodeloflatentDirichletallocation(LDA)isexplainedindetailwithafullderivationofanapproximateinferencealgorithmbasedonGibbssampling,in-cludingadiscussionofDirichlethyperparameterestimation.History:version1:May2005,version2.4:August2008.1IntroductionThistechnicalnoteisintendedtoreviewthefoundationsofBayesianparameteresti-mationinthediscretedomain,whichisnecessarytounderstandtheinnerworkingsoftopic-basedtextanalysisapproacheslikeprobabilisticlatentsemanticanalysis(PLSA)[Hofm99],latentDirichletallocation(LDA)[BNJ02]andothermixturemodelsofcountdata.Despitetheirgeneralacceptanceintheresearchcommunity,itappearsthatthereisnocommonbookorintroductorypaperthatﬁllsthisrole:MostknowntextsuseexamplesfromtheGaussiandomain,whereformulationsappeartoberatherdierent.Otherverygoodintroductoryworkontopicmodels(e.g.,[StGr07])skipsdetailsofalgorithmsandotherbackgroundforclarityofpresentation.Wethereforewillsystematicallyintroducethebasicconceptsofparameterestima-tionwithacoupleofsimpleexamplesonbinarydatainSection2.Wethenwillin-troducetheconceptofconjugacyalongwithareviewofthemostcommonprobabilitydistributionsneededinthetextdomaininSection3.Thejointpresentationofconjugacywithassociatedreal-worldconjugatepairsdirectlyjustiﬁesthechoiceofdistributionsintroduced.Section4willintroduceBayesiannetworksasagraphicallanguagetode-scribesystemsviatheirprobabilisticmodels.Withthesebasicconcepts,wepresenttheideaoflatentDirichletallocation(LDA)inSection5,aﬂexiblemodeltoestimatethepropertiesoftext.OntheexampleofLDA,theusageofGibbssamplingisshownasastraight-forwardmeansofapproximateinferenceinBayesiannetworks.TwootherimportantaspectsofLDAarediscussedafterwards:InSection6,theinﬂuenceofLDAhyperparametersisdiscussedandanestimationmethodproposed,andinSection7,methodsarepresentedtoanalyseLDAmodelsforqueryingandevaluation.22ParameterestimationapproachesWefacetwoinferenceproblems,(1)toestimatevaluesforasetofdistributionparam-eters#thatcanbestexplainasetofobservationsXand(2)tocalculatetheprobabilityofnewobservations˜xgivenpreviousobservations,i.e.,toﬁndp(˜xjX).Wewillrefertotheformerproblemastheestimationproblemandtothelatterasthepredictionorregressionproblem.ThedatasetX,fxigjXji=1canbeconsideredasequenceofindependentandidenti-callydistributed(i.i.d.)realisationsofarandomvariable(r.v.)X.Theparameters#aredependentonthedistributionsconsidered,e.g.,foraGaussian,#=f;2g.Forthesedataandparameters,acoupleofprobabilityfunctionsareubiquitousinBayesianstatistics.TheyarebestintroducedaspartsofBayes’rule,whichis1:p(#jX)=p(Xj#)p(#)p(X);(1)andwedeﬁnethecorrespondingterminology:posterior=likelihoodpriorevidence:(2)Inthenextparagraphs,wewillshowdierentestimationmethodsthatstartfromsimplemaximisationofthelikelihood,thenshowhowpriorbeliefonparameterscanbeincor-poratedbymaximisingtheposteriorandﬁnallyuseBayes’ruletoinferacompleteposteriordistribution.2.1MaximumlikelihoodestimationMaximumlikelihood(ML)estimationtriestoﬁndparametersthatmaximisethelikeli-hood,L(#jX),p(Xj#)=\x2XfX=xj#g=Yx2Xp(xj#);(3)i.e.,theprobabilityofthejointeventthatXgeneratesthedataX.BecauseoftheproductinEq.3,itisoftensimplertousetheloglikelihood,L,logL.TheMLestimationproblemthencanbewrittenas:ˆ#ML=argmax#L(#jX)=argmax#Xx2Xlogp(xj#):(4)Thecommonwaytoobtaintheparameterestimatesistosolvethesystem:@L(#jX)@#k!=08#k2#:(5)1Derivation:p(#jX)p(X)=p(X;#)=p(Xj#)p(#).3Theprobabilityofanewobservation˜xgiventhedataXcannowbefoundusingtheapproximation2:p(˜xjX)=Z#2p(˜xj#)p(#jX)d#(6)Z#2p(˜xjˆ#ML)p(#jX)d#=p(˜xjˆ#ML);(7)thatis,thenextsampleisanticipatedtobedistributedwiththeestimatedparametersˆ#ML.Asanexample,considerasetCofNBernoulliexperimentswithunknownparam-eterp,e.g.,realisedbytossingadeformedcoin.TheBernoullidensityfunctionforther.v.Cforoneexperimentis:p(C=cjp)=pc(1p)1c,Bern(cjp)(8)wherewedeﬁnec=1forheadsandc=0fortails3.BuildinganMLestimatorfortheparameterpcanbedonebyexpressingthe(log)likelihoodasafunctionofthedata:L=logNYi=1p(C=cijp)=NXi=1logp(C=cijp)(9)=n(1)logp(C=1jp)+n(0)logp(C=0jp)=n(1)logp+n(0)log(1p)(10)wheren(c)isthenumberoftimesaBernoulliexperimentyieldedeventc.Dierentiatingwithrespectto(w.r.t.)theparameterpyields:@L@p=n(1)pn(0)1p!=0,ˆpML=n(1)n(1)+n(0)=n(1)N;(11)whichissimplytheratioofheadsresultstothetotalnumberofsamples.Toputsomenumbersintotheexample,wecouldimaginethatourcoinisstronglydeformed,andafter20trials,wehaven(1)=12timesheadsandn(0)=8timestails.ThisresultsinanMLestimationofofˆpML=12=20=0:6.2.2MaximumaposterioriestimationMaximumaposteriori(MAP)estimationisverysimilartoMLestimationbutallowstoincludesomeaprioribeliefontheparametersbyweightingthemwithapriordis-tributionp(#).Thenamederivesfromtheobj

Parameter estimation for text analysis

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

硕士论文-用于供应链合作伙伴选择的ERP数据挖掘系统的设计与

小家电安规常识

澄清池管道安装工程施工方案

我国登革热与基孔肯雅热疫情形势和风险评估

重大危险源清单及管理方案(风险评价)

[和锐方略] 项目管理流程ppt---yz2010

物流工程导论

计算机网络基础项目一基础知识

余世维精典讲义-职业经理人常犯的11种错误2

环境照明

相关文档

相关搜索