您好,欢迎访问三七文档
1MAXIMUMLIKELIHOODANDBAYESIANMETHODSFORMIXTURESOFNORMALDISTRIBUTIONS.PeterM.Saama.UCLAOfficeofAcademicComputing.November,1997.A.ABSTRACTDatawerewaitingtimesbetweeneruptionsoftheOldFaithfulgeyserinYellowstoneNationalPark,Wyoming,USA.Thesamplehistogramshowedevidenceofbimodality.Forthwith,atwo-componentnormalmixturemodelwasfittedtothedata.TheGauss-Newtonalgorithmwasusedtoobtainthemaximumlikelihoodestimateofthenuisanceparametersinthemixturemodel.AnalternativemethodwhichusestheGibbsSamplertoobtainparameterestimatesaswellas100(1- FUHGLEOH EDQGV IRU WKH SDUDPHWHUV ZDV LPSOHPHQWHG DQG LV SUHVHQWHG B.INTRODUCTIONBecauseofoverdispersionandheterogeneityinthepopulation,amixtureofdistributionsisoftenusedtomodelthequantitativeresponse.Suchdistributionsareoftenconsideredappropriatemodelsforthoughttoconsistofanumberofrelativelydistinctsub-populations(c).Insituationswherethenumberofcomponentsisunknown,mixturedensitiesoftheform()∑=kjjjj12,Nσθπhavefoundtheirwidestapplicationsasamodelbasedclusteringprocedure;jπistheprobabilitythatobservationiycomesfromcomponentthjofthemixture.Hereinπτλθ,,=willdenotethesetofallunknownparametersand()..pisusedtodenoteagenericconditionalprobabilitydensityfunction.AmixtureoftwonormaldensitieswasfirstconsideredbyPearsonin1894withparameterestimatesobtainedfromthemethodofmomentsandinvolvedthesolutionofaninth-degreepolynomial.TheseminalpaperontheEMalgorithm(Dempster,LairdandRubin,1977)hasgreatlystimulatedworkonfinitemixturesofdistributions.Applicationsofmixturemodels2reportedbyTitterington,SmithandMakov(1985)andMcLachlanandBasford(1988)usetheExpectationMaximization(EM)algorithm.Itsdisadvantagesinclude:•extremeslownessofconvergencewhentheproportionofmissingdataishigh;•absenceofstandarderrorsfromtheinformationmatrixatconvergence.CompetitorsofEMareGauss-Newton(Lois,1982;Aitkinetal,1994),FisherScoring(Rao,1948),andDifferentialEvolution(PriceandStorn,1997).TheGauss-Newton(GN)algorithm,isnotguaranteedtoconvergewhenthelog-likelihoodisnotconcavebutwhenitdoesconverge,thisrateofconvergenceisusuallyquadratic,comparedtolinearfromEM.AhybridEM-GNwasproposedandimplementedbyAitkinetal(1994).ABayesiananalysisofmixturemodelspresentscertainadvantagesovertheclassicalapproaches.Intheory,quantitiesofinterestarewrittendownasintegralsoftheform()θθdypynn)()F()F(∫Θ=ΘΕ,butinpracticetheseintegralscannotbeevaluatedbytraditionalnumericalmethods.Whenthenumberofgroupsisassumedknown,MarkovChainMonteCarloMethodssuchastheGibbssamplercanbeusedtoperformtheintegration.Itisawell-knownprobleminfinitemixturemodelsthattheparametersarefundamentallynotidentifiableinthatthelikelihoodparameterscorrespondingtothekcomponentsisunchangedbypermutationsofthecomponentlabelsk,,1K.InaBayesiananalysis,thistypicallyleadstoajointdensityoftheparameterswhichishighlymultimodalwhichcauseslabel-switchingintheGibbssampleroutputandmakesinferencesforindividualcomponentsofthemixturemeaningless.AcommonpracticeistoimposeidentifiabilityconstraintsonthemodelparameterssuchaskσσσK11butthisisoftennotasatisfactorysolution(DieboltandRobert,1994).Stephens(1997)suggestsageneralsolutionwhichinvolvespermutingsamplesfromtheparameterposteriordensitysoastoremoveasmuchmultimodalityaspossibleandallowsinterpretationsforgroupstobediscoveredratherthanimposed.3ThepurposeofthispaperistopresentEM,Gauss-Newton,andMCMCalgorithmsforfittingatwo-componentmixturemodelinordertoprovidethereaderwithtoolsforpracticalmixtureestimation.C.MATERIALSANDMETHODSThedata(Appendix:TableA.)takenfromVenebales,W.N.andB.D.Ripley(1995)arewaitingtimesbetweeneruptionsoftheOldFaithfulgeyserinYellowstoneNationalPark,Wyoming,USA.AhistogramofthewaitingtimesisshowninFigure1.Figure1.HistogramforthewaitingtimesbetweensuccessiveeruptionsfortheOldFaithfulgeyser,withnon-parametricandparametricestimateddensitiessuperimposed.Frominspectionofthisfigureamixtureoftwonormaldistributionswouldseemtobeareasonabledescriptivemodelforthemarginaldistributionsofwaitingtimes.EM/Gauss-NewtonalgorithmIfniyi ,,2,1,K=isasamplewaitingtime,thelog-likelihoodfunctionforamixtureoftwonormalcomponentsis:()−−+−=∑=222111122111log,,,,σμφσπσμφσπσμσμπiiniyyL 4Theparameterestimatesandtheirvarianceswereobtainedbyminimizing-LusinganimplementationofahybridEM/GNalgorithmintheS-PLUSsoftware.InitialvaluesfortheparameterscanbeobtainedbythemethodofmomentsdescribedinEverittandHand(1985)butforwellseparatedcomponents,initialvaluescanbespecifiedbyreferencetothesamplehistogramandwere:Gibbs-SamplerAssumethateachobservationforthewaitingtime,niyi ,,2,1,K=,isdrawnfromoneoftwogroupswithcommonvariance.Let2,1=iTbethetruegroupoftheithobservationwheregroupjhasanormaldistributionwithmeanjλandprecisionτ.Furthermore,assumethatanunknownfractionπofobservationsareingroup2;π−1ingroup1.Themodelisthus,()τλ,Normal~iTiy()jjTαπ:Dirichlet~Areparameterisationofthismodel,whichensuresthatthedatadonotgointoonecomponentofthemixture,givenbyRobert(1994)is:,012+=ωωλλThedirectedacyclicgraphforthismodelisshownbelowinFigure2.Theconjugatepriorforπτλθ,,=isoftheform(Ro
本文标题:1 MAXIMUM LIKELIHOOD AND BAYESIAN METHODS FOR MIXT
链接地址:https://www.777doc.com/doc-3182754 .html