您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 其它相关文档 > 用贝叶斯方法重建基因进化历史
实验3用贝叶斯方法重建基因进化历史SNPsspeciestreegenetreemaximumparsimonyMPmaximumlikelihoodMLheuristic1.Bayesr.v.priordistributionBayesf(θ|D)=f(D|θ)f(θ)f(D)(1)f(θ|D)posteriorprobabilitydistributionf(θ)priorprobabilitydistributionf(D|θ)f(D)=∫f(D|θ)f(θ)Ωdθ(2)Ω=(Ψ,Φ)ΨΦψ=(τ,β)topologyτbranchlengthβφf(D|θ)Dφθ=(ψ,φ)(1)p(τ,β,φ|D)=L(D|τ,β,φ)p(τ,β,φ)∑∫∫L(D|τ,β,φ)p(τ,β,φ)dφdβΦBτ(3)(3)∑⋅Bmarginalprobabilitydistributionp(τ|D)=∫∫L(D|τ,β,φ)p(τ,β,φ)dφdβΦB∑∫∫L(D|τ,β,φ)p(τ,β,φ)dφdβΦBτ(4)MCMC1.1MarkovchainMonteCarlo,MCMCMetropolis-HastingΩ�θ(0),θ(1),θ(2),…��θ(i+1),θ(i+2),θ(i+3),…�θMarkovchainlawoflargenumbers�θ(0),θ(1),…,θ(i)�Ωθ1→θ2q(θ1,θ2)Metropolis-HastingθMetropolis-Hastingacceptθ∗R=min�1,p(θ∗|D)q(θ∗,θ)p(θ|D)q(θ,θ∗)�(5)qq(θ∗,θ)q(θ,θ∗)=1(5)R=min�1,L(D|θ∗)p(θ∗)L(D|θ)p(θ)�(6)R=1R1runif(1)αR𝛼1.MCMC1.θ(0)=�ψ(0),φ(0)�2.θ(i)=�ψ(i),φ(i)�(1)ψ(i)q1Φφ∗(6)RNGrandomnumbergeneratorφ(i+1)=φ∗φ(i+1)=φ(i)(2)φ(i+1)q2Ψψ∗nnψ(i+1)nMetropolis-Hasting3.MCMCMCMCMCMC1.2MCMC1MCMC2-(1)1startingtree2edgeNNInearest-neighborinterchangeinternalnodeSPRsubtreepruningandregraftingTBRtreebisectionandreconnection.31edge21.3MCMC[0,M]HKY/κU~Uniform(−δκ,δκ)κ∗=κ+UDirichletz=(z1,z2,…,zk)∑zi=ciz∗=cYY(αz1,αz2,…,αzk)DirichletαtuningparameterαHastingDirichlet2.BayesUniformMrBayes2.1{pA,pC,pT,pG}pA+pC+pT+pG=1Dirichlet2.2DNAgap4×416416-4=121230Q=�abcdefghijklmnop�1JC69Jukes-Cantor1969b=c=d=e=g=⋯=n=oa=f=k=p=−3bp(A)=p(C)=p(G)=p(T)=0.251MrBayeslsetnst=12K2PKimura19800.25transition--transversion-2MrBayeslsetnst=23HKYHasegawa-Kishino-Yano1985K2P4GTR6lsetnst=62.3MrBayesGammaGammaGammaGammashapeGamma2.42topologybranchlengthMrBayesMrBayes1.NexusNexusASCIIdimensions——ntaxncharformat——datatype=dnaDNAgap=-missing=matrix——interleave=yesinterleavedsequencesnexusMacCladeMesquiteMrbayesthefullNexusstandardMrbayes23(23)(2,3){23}{2,3}DNA{A,C,G,T,R,Y,M,K,S,W,H,B,V,D,N}RNA{A,C,G,U,R,Y,M,K,S,W,H,B,V,D,N}Protein{A,R,N,D,C,Q,E,G,H,I,L,K,M,F,P,S,T,W,Y,V,X}{0,1}{0,1,2,3,4,5,6,5,7,8,9}2.executefilenameexefilename3.lsetprsetlsetprset3.1lset1)showmodel2)helplset3)nucmodelDNAnucleotidesubstitutionmodel4by4doubletpairedstemregionsofribosomalDNAcodonDNAsequenceintermsofitscodons4)nstF81JCmodelGTRnst=65)codeDNAcodonPloidy6)ratesinvgamma(gamma-shapedratevariationwithaproportionofinvariablesites)7)ngammacat(gammathenumberofdiscretecategoriesusedtoapproximatethegammadistribution)48)covarionparsmodelparsimonymodelthecovariotidemodel9)helplset3.2prior6thetopology,thebranchlengths,thefourstationaryfrequenciesofthenucleotides,6thesixdifferentnucleotidesubstitutionrates,theproportionofinvariablesites,GammatheshapeparameterofthegammadistributionofratevariationhelpprsetRevmatpr(GTR6),Statefreqpr(GTR4),Shapepr(Gamma),Pinvarpr(),Topologypr(),Brlenspr()RevmatprandStatefreqprpriorprobabilitydensityaflatDirichlet(1.0)StatefreqprequalJCandSYMprsetstatefreqpr=fixed(equal)statefreqprflatDirichletpriorequalnucleotidefrequenciesprsetstatefreqpr=Dirichlet(10,10,10,10)prsetstatefreqpr=Dirichlet(100,100,100,100)prsetstatefreqpr=Dirichlet(1,1,1,1)prsst=Dir(1,1,1,1)Shapeprthepriorfortheα(shape)parameterofthegammadistributionofratevariation.PinvarprthepriorfortheproportionofinvariablesitesTopologypruniformputsequalprobabilityonalldistinct,fullyresolvedtopologies.Thealternativeistoconstrainsomenodesinthetreetoalwaysbepresentbutwewillnotattemptthatinthisanalysis.Brlensprunconstrainedclock-constrainedunconstrainedbrlensprexponentialuniform10.0showmodel4.mcmchelpmcmcSeedSwapseedthechainswappingsequenceNgennumberofgenerationsngenmcmcpmcmcpngen=10000bayes(Nruns=2)McmcdiagnyesDiagnfreq1000bayes1000filename.mcmcthetreesamples1000(burnin)(burninfrac)Relburni(relburnin=no)(relburnin=yes)(relburnin=yesandburninfrac=0.25)25bayesMetropoliscouplingtheMCMCsamplingofthetargetdistributionSwapfreq,Nswaps,NchainsTempMetropoliscouplingNchains1heatingnn-1heatedchainsn4bayes31coldchainheating50MPIBayesanincrementalheatingschemeiheatedthepower1/(1+iλ)λTempHeatingflattenouttheposteriorprobabilityisolatedpeakscoldchainSwapfreq3(numberofswapsNswaps)Samplefreq10010mcmcpsamplefreq=10filename.prunfilename.nex.run1.pfilename.nex.run2.pfilename.tfilename.run1.tfilename.run2.tPrintfreq100bayesfilename.tStartingtree5.mcmcmcmcpmcmcTheproposalprobabilitiespropsMarkov1001254loglikelihoodvaluesMetropoliscouplingMetropoliscouplingtemperaturedifference6.yestheaveragestandarddeviationofsplitfrequencies0convergencediagnostictheloglikelihoodvalues7.SummarizingSamplesofSubstitutionModelParameterssamplefreq10filename.pID21)thegenerationnumberGen2)theloglikelihoodofthecoldchainLnL3)thetotaltreelengthTL4)6GTRthesixGTRrateparametersr(A-C),r(A-G)5)4thefourstationarynucleotidefrequenciespi(A),pi(C)6)Gammatheshapeparameterofthegammadistributionofratevariationalpha7)theproportionofinvariablesitespinvar7.1Sumpsumpsumpburnin=250filename.p25sumpngentheloglikelihoodvaluessumpMeanVariance95%CredibleintervalLowerUpperMedianPSRFPotentialScaleReductionFactorfilename.pPSRF1.08.SummarizingSamplesofTreesandBranchLengthsfilename.tnexus:,sumtsumtburnin=250Sumtsummarystatisticsforth
本文标题:用贝叶斯方法重建基因进化历史
链接地址:https://www.777doc.com/doc-4372456 .html