Variational Bayesian multinomial probit regression

Girolami,M.andRogers,S.(2006)VariationalBayesianmultinomialprobitregressionwithGaussianprocesspriors.NeuralComputation18(8):pp.1790-1817.:9November2007GlasgowePrintsService@dcs.gla.ac.ukDepartmentofComputingScienceUniversityofGlasgowTechnicalReport:TR-2005-205fgirolami,srogersg@dcs.gla.ac.uk.November9,20051AbstractItiswellknowninthestatisticsliteraturethataugmentingbinaryandpolychotomousresponsemodelswithGaussianlatentvariablesen-ablesexactBayesiananalysisviaGibbssamplingfromtheparameterposterior.Byadoptingsuchadataaugmentationstrategy,dispensingwithpriorsoverregressioncoecientsinfavourofGaussianProcess(GP)priorsoverfunctions,andemployingvariationalapproximationstothefullposteriorweobtainecientcomputationalmethodsforGaussianProcessclassicationinthemulti-classsetting1.Themodelaugmentationwithadditionallatentvariablesensuresfullaposteri-oriclasscouplingwhilstretainingthesimpleaprioriindependentGPcovariancestructurefromwhichsparseapproximations,suchasmulti-classInformativeVectorMachines(IVM),emergeinaverynat-uralandstraightforwardmanner.ThisisthersttimethatafullyVariationalBayesiantreatmentformulti-classGPclassicationhasbeendevelopedwithouthavingtoresorttoadditionalexplicitapprox-imationstothenon-Gaussianlikelihoodterm.EmpiricalcomparisonswithexactanalysisviaMCMCandLaplaceapproximationsillustratetheutilityofthevariationalapproximationasacomputationallyeco-nomicalternativetofullMCMCanditisshowntobemoreaccuratethantheLaplaceapproximation.1IntroductionIn(AlbertandChib,1993)itwasrstshownthatbyaugmentingbinaryandmultinomialprobitregressionmodelswithasetofcontinuouslatentvariablesyk,correspondingtothek'thresponsevaluewhereyk=mk+,N(0;1)andmk=Pjkjxj,anexactBayesiananalysiscanbeper-formedbyGibbssamplingfromtheparameterposterior.Asanexam-pleconsiderbinaryprobitregressionontargetvariablestn2f0;1g,theprobitlikelihoodforthenthdatasampletakingunitvalue(tn=1)isP(tn=1jxn;)=(Txn),whereisthestandardisedNormalCumula-tiveDistributionFunction(CDF).Now,thiscanbeobtainedbythefollow-ingmarginalisationRP(tn=1;ynjxn;)dyn=RP(tn=1jyn)p(ynjxn;)dynandasbydenitionP(tn=1jyn)=(yn0)thenweseethattherequiredmarginalissimplythenormalizingconstantofalefttruncatedunivariate1Matlabcodetoallowreplicationofthereportedresultsisavailableat(tn=1jxn;)=R(yn0)Nyn(Txn;1)dyn=(Txn).ThekeyobservationhereisthatworkingwiththejointdistributionP(tn=1;ynjxn;)=(yn0)Nyn(Txn;1)providesastraightforwardmeansofGibbssamplingfromtheparameterposteriorwhichwouldnotbethecaseifthemarginalterm,(Txn),wasemployedindeningthejointdistributionoverdataandparameters.Thisdataaugmentationstrategycanbeadoptedindevelopingecientmethodstoobtainbinaryandmulti-classGaussianProcess(GP)(WilliamsandRasmussen,1996)classiersaswillbepresentedinthispaper.Withtheexceptionof(Neal,1998),whereafullMarkovChainMonteCarlo(MCMC)treatmenttoGPbasedclassicationisprovided,allotherapproacheshavefocussedonmethodstoapproximatetheproblematicformoftheposte-rior2whichallowanalyticmarginalisationtoproceed.Laplaceapproxima-tionstotheposteriorweredevelopedin(WilliamsandBarber,1998)whilstlower&upperboundquadraticlikelihoodapproximationswereconsideredin(Gibbs,2000).Variationalapproximationsforbinaryclassicationweredevelopedin(Seeger,2000)wherealogitlikelihoodwasconsideredandmeaneldapproximationswereappliedtoprobitlikelihoodtermsin(OpperandWinther,2000),(Csatoetal,2000)respectively.Additionally,incremen-tal(Quinonero-CandelaandWinther,2003)orsparseapproximationsbasedonAssumedDensityFiltering(ADF)(CsatoandOpper,2002),InformativeVectorMachines(IVM)(Lawrence,etal2003)andExpectationPropagation(EP)(Minka,2001;Kim,2005)havebeenproposed.Withtheexceptionsof(WilliamsandBarber,1998;Gibbs,2000;SeegerandJordan,2004;Kim,2005)thefocusofmostrecentworkhaslargelybeenonthebinaryGPclas-sicationproblem.In(SeegerandJordan,2004)amulti-classgeneralisationoftheIVMisdevelopedwheretheauthorsemployamultinomial-logitsoft-maxlikelihood.However,considerablerepresentationaleortisrequiredtoensurethatthescalingofcomputationandstoragerequiredoftheproposedmethodmatchesthatoftheoriginalIVMwithlinearscalinginthenumberofclasses.Incontrast,byadoptingtheprobabilisticrepresentationof(Al-bertandChib,1993)wewillseethatGPbasedK-classclassicationandecientsparseapproximations(IVMgeneralisationswithscalinglinearinthenumberofclasses)canberealisedbyoptimisingastrictlower-boundofthemarginallikelihoodofamultinomialprobitregressionmodelwhich2Thelikelihoodisnonlinearintheparametersduetoeitherthelogisticorprobitlinkfunctionsrequiredintheclassicationsetting3requiresthesolutionofKcomputationallyindependentGPregressionprob-lemswhilststilloperatingjointly(statistically)onthedata.WewillalsoshowthattheaccuracyofthisapproximationiscomparabletothatobtainedviaMCMC.Thefollowingsectio

Variational Bayesian multinomial probit regression

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

初中物理竞赛讲座11：简单机械

钻井工程课程设计

秦职院建筑工程技术人才培养计划-初稿140719

大型数控电火花线切割机床

抗精神失常药

金蝶HR产品培训_010HR培训_考勤管理_基于考勤系统的业

大厦管理处危险源辨识与风险评价记录表

水电站管理制度与管理办法

7市场营销第七章目标及定位

Quartus II21 基本设计流程

相关文档

相关搜索