您好,欢迎访问三七文档
Girolami,M.andRogers,S.(2006)VariationalBayesianmultinomialprobitregressionwithGaussianprocesspriors.NeuralComputation18(8):pp.1790-1817.:9November2007GlasgowePrintsService@dcs.gla.ac.ukDepartmentofComputingScienceUniversityofGlasgowTechnicalReport:TR-2005-205fgirolami,srogersg@dcs.gla.ac.uk.November9,20051AbstractItiswellknowninthestatisticsliteraturethataugmentingbinaryandpolychotomousresponsemodelswithGaussianlatentvariablesen-ablesexactBayesiananalysisviaGibbssamplingfromtheparameterposterior.Byadoptingsuchadataaugmentationstrategy,dispensingwithpriorsoverregressioncoecientsinfavourofGaussianProcess(GP)priorsoverfunctions,andemployingvariationalapproximationstothefullposteriorweobtainecientcomputationalmethodsforGaussianProcessclassicationinthemulti-classsetting1.Themodelaugmentationwithadditionallatentvariablesensuresfullaposteri-oriclasscouplingwhilstretainingthesimpleaprioriindependentGPcovariancestructurefromwhichsparseapproximations,suchasmulti-classInformativeVectorMachines(IVM),emergeinaverynat-uralandstraightforwardmanner.ThisisthersttimethatafullyVariationalBayesiantreatmentformulti-classGPclassicationhasbeendevelopedwithouthavingtoresorttoadditionalexplicitapprox-imationstothenon-Gaussianlikelihoodterm.EmpiricalcomparisonswithexactanalysisviaMCMCandLaplaceapproximationsillustratetheutilityofthevariationalapproximationasacomputationallyeco-nomicalternativetofullMCMCanditisshowntobemoreaccuratethantheLaplaceapproximation.1IntroductionIn(AlbertandChib,1993)itwasrstshownthatbyaugmentingbinaryandmultinomialprobitregressionmodelswithasetofcontinuouslatentvariablesyk,correspondingtothek'thresponsevaluewhereyk=mk+,N(0;1)andmk=Pjkjxj,anexactBayesiananalysiscanbeper-formedbyGibbssamplingfromtheparameterposterior.Asanexam-pleconsiderbinaryprobitregressionontargetvariablestn2f0;1g,theprobitlikelihoodforthenthdatasampletakingunitvalue(tn=1)isP(tn=1jxn;)=(Txn),whereisthestandardisedNormalCumula-tiveDistributionFunction(CDF).Now,thiscanbeobtainedbythefollow-ingmarginalisationRP(tn=1;ynjxn;)dyn=RP(tn=1jyn)p(ynjxn;)dynandasbydenitionP(tn=1jyn)=(yn0)thenweseethattherequiredmarginalissimplythenormalizingconstantofalefttruncatedunivariate1Matlabcodetoallowreplicationofthereportedresultsisavailableat(tn=1jxn;)=R(yn0)Nyn(Txn;1)dyn=(Txn).ThekeyobservationhereisthatworkingwiththejointdistributionP(tn=1;ynjxn;)=(yn0)Nyn(Txn;1)providesastraightforwardmeansofGibbssamplingfromtheparameterposteriorwhichwouldnotbethecaseifthemarginalterm,(Txn),wasemployedindeningthejointdistributionoverdataandparameters.Thisdataaugmentationstrategycanbeadoptedindevelopingecientmethodstoobtainbinaryandmulti-classGaussianProcess(GP)(WilliamsandRasmussen,1996)classiersaswillbepresentedinthispaper.Withtheexceptionof(Neal,1998),whereafullMarkovChainMonteCarlo(MCMC)treatmenttoGPbasedclassicationisprovided,allotherapproacheshavefocussedonmethodstoapproximatetheproblematicformoftheposte-rior2whichallowanalyticmarginalisationtoproceed.Laplaceapproxima-tionstotheposteriorweredevelopedin(WilliamsandBarber,1998)whilstlower&upperboundquadraticlikelihoodapproximationswereconsideredin(Gibbs,2000).Variationalapproximationsforbinaryclassicationweredevelopedin(Seeger,2000)wherealogitlikelihoodwasconsideredandmeaneldapproximationswereappliedtoprobitlikelihoodtermsin(OpperandWinther,2000),(Csatoetal,2000)respectively.Additionally,incremen-tal(Quinonero-CandelaandWinther,2003)orsparseapproximationsbasedonAssumedDensityFiltering(ADF)(CsatoandOpper,2002),InformativeVectorMachines(IVM)(Lawrence,etal2003)andExpectationPropagation(EP)(Minka,2001;Kim,2005)havebeenproposed.Withtheexceptionsof(WilliamsandBarber,1998;Gibbs,2000;SeegerandJordan,2004;Kim,2005)thefocusofmostrecentworkhaslargelybeenonthebinaryGPclas-sicationproblem.In(SeegerandJordan,2004)amulti-classgeneralisationoftheIVMisdevelopedwheretheauthorsemployamultinomial-logitsoft-maxlikelihood.However,considerablerepresentationaleortisrequiredtoensurethatthescalingofcomputationandstoragerequiredoftheproposedmethodmatchesthatoftheoriginalIVMwithlinearscalinginthenumberofclasses.Incontrast,byadoptingtheprobabilisticrepresentationof(Al-bertandChib,1993)wewillseethatGPbasedK-classclassicationandecientsparseapproximations(IVMgeneralisationswithscalinglinearinthenumberofclasses)canberealisedbyoptimisingastrictlower-boundofthemarginallikelihoodofamultinomialprobitregressionmodelwhich2Thelikelihoodisnonlinearintheparametersduetoeitherthelogisticorprobitlinkfunctionsrequiredintheclassicationsetting3requiresthesolutionofKcomputationallyindependentGPregressionprob-lemswhilststilloperatingjointly(statistically)onthedata.WewillalsoshowthattheaccuracyofthisapproximationiscomparabletothatobtainedviaMCMC.Thefollowingsectio
本文标题:Variational Bayesian multinomial probit regression
链接地址:https://www.777doc.com/doc-3335728 .html