机器学习中的贝叶斯进阶

Bayes'theoryContent•BayesReview•ConjugatePrior•DifficultyInBayes•BayesNetworkBayesReview•抛掷硬币三次，结果为正面，正面，正面•频率学派：P(Biasedcoin)=1•贝叶斯学派：先验分布：似然函数：后验分布：那么当样本数时呢？nnBayesReviewWhyweuseBayesinmachinelearing：《machinelearing》[Mitchell]BayesReview《machinelearing》[Mitchell]Conjugateprior•InBayesianprobabilitytheory,iftheposteriordistributionsp(θ|x)areinthesamefamilyasthepriorprobabilitydistributionp(θ),thepriorandposteriorarethencalledconjugatedistributions,andtheprioriscalledaconjugatepriorforthelikelihoodfunction.Conjugateprior•似然是二项分布，先验是贝塔分布，则后验也是贝塔分布•似然是泊松、正态、指数分布，先验是Gama分布，则后验也是Gama分布•似然是正态分布，先验正态分布，则后验也是正态分布•似然是多项分布，先验Dirichlet分布，则后验也是Dirichlet分布Conjugateprior•Beta分布：(20,20)Beta11()(,)(1x)()()Betax正面概率p先验分布为Beta(α，β)抛掷硬币得10次正面，5次反面则其后验分布为Beta(α+10，β+5)Conjugateprior•似然是二项分布，先验是Beta分布（这里设α=β=1），后验也是Beta分布11(1)(p)(,)ppPB~(,)pBetaDifficultyInBayes《machinelearing》[Mitchell]DifficultyInBayesDifficultyInBayes•onecommoncriticismoftheBayesianapproachisthatthepriordistributionisoftenselectedonthebasisofmathematicalconvenienceratherthanasareflectionofanypriorbeliefs.•Eventhesubjectivenatureoftheconclusionsthroughtheirdependenceonthechoiceofpriorisseenbysomeasasourceofdifficulty.•Reducingthedependenceonthepriorisonemotivationforso-callednoninformativepriors.However,theseleadtodifficultieswhencomparingdifferentmodels,andindeedBayesianmethodsbasedonpoorchoicesofpriorcangivepoorresultswithhighconfidence.•Frequentistevaluationmethodsoffersomeprotectionfromsuchproblems,andtechniquessuchascross-validationremainusefulinareassuchasmodelcomparison.《PRML》DifficultyInBayes•在上节我们看到，贝叶斯统计学是利用后验分布对θ进行推断。这种推断的计算很多情况下要用积分计算来完成。比如，我们要计算θ的函数g(θ)的期望：•其中函数f表示后验分布。当g(θ)=θ时，得到的就是关于θ的点估计。•但是对很多贝叶斯推断问题来说，有时候后验分布过于复杂，使得积分没有显示结果，数值方法也很难应用；有时候需要计算多重积分（比如后验分布是多元分布时）。|x(g(|x))()(|x)dEgfDifficultyInBayes•ReviewthepracticalapplicationofBayesianmethodswasforalongtimeseverelylimitedbythedifficultiesincarryingthroughthefullBayesianprocedure,particularlytheneedtomarginalize(sumorintegrate)overthewholeofparameterspace,which,asweshallsee,isrequiredinordertomakepredictionsortocomparedifferentmodels.《PRML》DifficultyInBayes•MCMCDifficultyInBayes•MonteCarlomethodsareveryflexibleandcanbeappliedtoawiderangeofmodels.However,theyarecomputationallyintensiveandhavemainlybeenusedforsmall-scaleproblems.•Morerecently,highlyefficientdeterministicapproximationschemessuchasvariationalBayesandexpectationpropagationhavebeendeveloped.•TheseofferacomplementaryalternativetosamplingmethodsandhaveallowedBayesiantechniquestobeusedinlarge-scaleapplications《PRML》BayesNetwork•朴素贝叶斯分类有一个限制条件，就是特征属性必须独立。当这个条件成立时，朴素贝叶斯分类法的准确率是最高的，但不幸的是，现实中各个特征属性间往往并不条件独立，而是具有较强的相关性，这样就限制了朴素贝叶斯分类的能力。贝叶斯分类中更高级、应用范围更广的一种算法——贝叶斯网络ProbabilisticGraphicalModels•Modeling(howtoencodeagraph)•Inference(givendata,knowngraph)•Learning(givendata,learningparameter；learningstructureofgraph)•AdsofPGM:Knowledgemeetsdata•BayesnetworkisakindofProbabilisticGraphicalModels.（有向无环图）ModelingQuestion:1.P(Letter)=P(Letter|Grade)?2.P(Letter)=P(Letter|Difficulty)?3.P(Letter|Grade)=P(Letter|Grade,Difficulty)?4.P(Grade|Difficulty)=P(Grade|Difficulty,Intelligence)?5.P(Grade|Intelligence)=P(Grade|SAT,Intelligence)?6.P(Difficulty)=P(Difficulty|Intelligence)?7.P(Difficulty|Grade)=P(Difficulty|Grade,Intelligence)?ModelingQuestion:1.P(Letter)=P(Letter|Grade)?2.P(Letter)=P(Letter|Difficulty)?3.P(Letter|Grade)=P(Letter|Grade,Difficulty)?4.P(Grade|Difficulty)=P(Grade|Difficulty,Intelligence)?5.P(Grade|Intelligence)=P(Grade|SAT,Intelligence)?6.P(Difficulty)=P(Difficulty|Intelligence)?7.P(Difficulty|Grade)=P(Difficulty|Grade,Intelligence)?P(D,I,G,S,L)=P(D)*P(I)*P(G|D,I)*P(S|I)*P(L|G)错，错，对，错，对，对，错Inference•Task：observeevidencee•Likelihood•ConditionalProbability•Maximumaposteriori•Qustion：Inference•ExactInference•ApproximateInference–LoopyBeliefPropagation–VariationalInferenceMeanFieldApproximation–MonteCarloMethods：MarkovChainMonteCarloApproximateInference•Sampling(MonteCarlo)methods。采样做法就是通过抽取大量的样本来逼近真实的分布。最简单的是importancesampling，根据出现的结果的比例采样。在高维空间中更有效的方法叫MarkovChainMonteCarlo，是利用马科夫链的性质来生成某个分布的样本，包括Metropolis-Hastings算法和GibbsSamppling算法等。•VariationalInfernece。变分推断采取的是另一种做法，通过限制近似分布的类型，得到一种局部最优，但具有确定解的近似后验分布。最简单的方法叫做mean-fieldapproximation，就是把概率图里的点全部解耦，看做相互独立的，然后对每个点引进一个变分参数，通过循环地迭代参数来最小化近似分布和真实分布的KL距离。•LoopyBeliefPropagation。之前我们讲到BeliefPropagation算法，是应用在无环的图上的，然后LBP就是不管图有没有环，都直接使用这个算法去求解。ApproximateInferenceLearning•Learning可以分很多情况，比如估计参数，或者估计模型的结构，或者两者都有。根据变量是否被观察到和结构是否已知可以对learning方法分类（见下图)：LearningParameter•如果每个随机变量的值都是可以直接观察的，那么这一步的训练是直观的，方法类似于朴素贝叶斯分类。•但是通常贝叶斯网络的中存在隐藏变量节点，那么训练方法比较复杂。•贝叶斯框架下，参数估计，即在所有可能的模型上做贝叶斯估计，贝叶斯参数估计和推断是等价的，这时候隐藏的节点就代表了这些参数。LearningParameter•题目：假设有3枚硬币分别记做A，B，C。这些硬币正面出现的概率分别是π,p和q。进行如下掷硬币实验：先掷硬币A，根据其结果选出硬币B或C，正面选B，反面选硬币C；然后投掷选重中的硬币，出现正面记作1，反面记作0；•独立地重复n次（n=10)，结果1111110000•我们只能观察投掷硬币的结果，而不知其过程，估计这三个参数π,p和q。解不出来EM算法

机器学习中的贝叶斯进阶

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

杭州中恒电气股份有限公司

5mm环氧石英石地坪施工方案

金融信托与租赁6

袋笼的质量控制--初稿

行政诉讼法学第一次作业汇总

动感地带职场主题活动年度策略规划提案

海洋环境保护

汇编课程设计个人档案管理文件

西农会计学原理试卷

安装预算试题及答案

相关文档

相关搜索