您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > AI人工智能 > 74CMU高级机器学习判决分类器
1©EricXing@CMU,2006-20091AdvancedMachineLearningAdvancedMachineLearningGenerativeversesdiscriminativeGenerativeversesdiscriminativeclassifierclassifierEricXingEricXingLecture4,August10,2009Reading:©EricXing@CMU,2006-20092Discussion:GenerativeanddiscriminativeclassifierszGoal:Wishtolearnf:X→Y,e.g.,P(Y|X)zGenerative:zModelingthejointdistributionofalldatazDiscriminative:zModelingonlypointsattheboundary2©EricXing@CMU,2006-20093Generativevs.DiscriminativeClassifierszGoal:Wishtolearnf:X→Y,e.g.,P(Y|X)zGenerativeclassifiers(e.g.,NaïveBayes):zAssumesomefunctionalformforP(X|Y),P(Y)Thisisa‘generative’modelofthedata!zEstimateparametersofP(X|Y),P(Y)directlyfromtrainingdatazUseBayesruletocalculateP(Y|X=x)zDiscriminativeclassifiers(e.g.,logisticregression)zDirectlyassumesomefunctionalformforP(Y|X)Thisisa‘discriminative’modelofthedata!zEstimateparametersofP(Y|X)directlyfromtrainingdataYnXnYnXn©EricXing@CMU,2006-20094Supposeyouknowthefollowing…zClass-specificDist.:P(X|Y)zClassprior(i.e.,weight):P(Y)zThisisagenerativemodelofthedata!),;()|(1111Σ==µrXpYXp),;()|(2222Σ==µrXpYXpBayesclassifier:3©EricXing@CMU,2006-20095OptimalclassificationzTheorem:Bayesclassifierisoptimal!zThatiszHowtolearnaBayesclassifier?zRecalldensityestimation.WeneedtoestimateP(X|y=k),andP(y=k)forallk©EricXing@CMU,2006-20096GaussianDiscriminativeAnalysiszlearningf:X→Y,wherezXisavectorofreal-valuedfeatures,Xn=Xn,1…Xn,mzYisanindicatorvectorzWhatdoesthatimplyabouttheformofP(Y|X)?zThejointprobabilityofadatumanditslabelis:zGivenadatumxn,wepredictitslabelusingtheconditionalprobabilityofthelabelgiventhedatum:YnXn{}2212/12)-(-exp)2(1),,1|()1(),|1,(2knkknnknknnypypypµπσπσµσµσxxx==×==={}{}∑−−−−=='2'212/12'2212/12)(exp)2(1)(exp)2(1),,|1(22kknkknknknypµπσπµπσπσµσσxxx4©EricXing@CMU,2006-20097ConditionalIndependencezXisconditionallyindependentofYgivenZ,iftheprobabilitydistributiongoverningXisindependentofthevalueofY,giventhevalueofZWhichweoftenwriteze.g.,zEquivalentto:©EricXing@CMU,2006-20098NaïveBayesClassifierzWhenXismultivariate-Gaussianvector:zThejointprobabilityofadatumanditlabelis:zThenaïveBayessimplificationzMoregenerally:zWherep(.|.)isanarbitraryconditional(discreteorcontinuous)1-Ddensity{})-()-(-exp)2(1),,1|()1(),|1,(1212/1knTknkknnknknnypypypµµππµµrrrrxxxx−ΣΣ=Σ=×==Σ=YnXn{}∏∏==×===jjkjnjkkjjkjkknjnknknnxyxpypypjk2,,212/12,,,,)-(-exp)2(1),,1|()1(),|1,(2,µπσπσµσµσx∏=×=mjnjnnnnyxpypyp1,),|()|(),|,(ηππηxYnXn,1Xn,2Xn,m…5©EricXing@CMU,2006-20099ThepredictivedistributionzUnderstandingthepredictivedistributionzUndernaïveBayesassumption:zFortwoclass(i.e.,K=2),andwhenthetwoclasseshavesthesamevariance,**turnsouttobealogisticfunction*),|,(),|,(),|(),,|,(),,,|(''''∑ΣΣ=ΣΣ==Σ=kkknkkknknnknnknxNxNxpxypxypµπµπµπµπµvvv11(){}1122212211222212122112111111ππσσσµπσµπµµµµσσ)(2log)-(explog)-(explog)][-]([)-(exp−⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧⎟⎟⎠⎞⎜⎜⎝⎛−−−⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧⎟⎟⎠⎞⎜⎜⎝⎛−−−++−+=∑∑+=∑jjjjjjnCxCxjjjjjjnjjjjjnjxnTxeθ−+=11)|(nnxyp11=**log)(explog)(exp),,,|(','','',,∑∑∑⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧⎟⎟⎠⎞⎜⎜⎝⎛−−−−⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧⎟⎟⎠⎞⎜⎜⎝⎛−−−−=Σ=kjjkjkjnjkkjjkjkjnjkknknCxCxxypσµσπσµσππµ222221211v©EricXing@CMU,2006-200910ThedecisionboundaryzThepredictivedistributionzTheBayesdecisionrule:zFormultipleclass(i.e.,K2),*correspondtoasoftmaxfunction∑−−==jxxnknnTjnTkeexypθθ)|(1nTxMjjnjnnexxypθθθ−=+=⎭⎬⎫⎩⎨⎧−−+==∑11111011exp)|(nTxxxnnnnxeeexypxypnTnTnTθθθθ=⎟⎟⎟⎟⎠⎞⎜⎜⎜⎜⎝⎛++===−−−1111121ln)|()|(ln6©EricXing@CMU,2006-200911Generativevs.DiscriminativeClassifierszGoal:Wishtolearnf:X→Y,e.g.,P(Y|X)zGenerativeclassifiers(e.g.,NaïveBayes):zAssumesomefunctionalformforP(X|Y),P(Y)Thisisa‘generative’modelofthedata!zEstimateparametersofP(X|Y),P(Y)directlyfromtrainingdatazUseBayesruletocalculateP(Y|X=x)zDiscriminativeclassifiers:zDirectlyassumesomefunctionalformforP(Y|X)Thisisa‘discriminative’modelofthedata!zEstimateparametersofP(Y|X)directlyfromtrainingdataYiXiYiXi©EricXing@CMU,2006-200912LinearRegressionzThedata:zBothnodesareobserved:zXisaninputvectorzYisaresponsevector(wefirstconsideryasagenericcontinuousresponsevector,thenweconsiderthespecialcaseofclassificationwhereyisadiscreteindicator)zAregressionschemecanbeusedtomodelp(y|x)directly,ratherthanp(x,y)YiXiN{}),(,),,(),,(),,(NNyxyxyxyxL3322117©EricXing@CMU,2006-200913LinearRegressionzAssumethatY(target)isalinearfunctionofX(features):ze.g.:zlet'sassumeavacuousfeatureX0=1(thisistheinterceptterm,why?),anddefinethefeaturevectortobe:zthenwehavethefollowinggeneralrepresentationofthelinearfunction:zOurgoalistopicktheoptimal.How!zWeseekthatminimizethefollowingcostfunction:θθ∑=−=niiiiyxyJ1221))(ˆ()(vθ©EricXing@CMU,2006-200914TheLeast-Mean-Square(LMS)methodzConsideragradientdescentalgorithm:zNowwehavethefollowingdescentrule:zForasingletrainingpoint,wehave:zThisisknownastheLMSupdaterule,ortheWidrow-HofflearningrulezThisisactuallyastochastic,coordinatedescentalgorithmzThiscanbeusedasaon-linealgorithm∑=+−+=nijitTiitjtjxy11)(θαθθxvjitTiitjtjxy)(1θαθθxv−+=+tjtjtjJ)(θθαθθ∂∂−=+18©EricXing@CMU,2006-200915ProbabilisticInterpretationofLMSzLetusassumethatthetargetvariableandtheinputsarerelatedbytheequation:whereεisanerrortermofunmodelede
本文标题:74CMU高级机器学习判决分类器
链接地址:https://www.777doc.com/doc-5450514 .html