您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 其它文档 > 斯坦福大学机器学习所有问题及答案合集
CS229机器学习(问题及答案)斯坦福大学目录(1)作业1(SupervisedLearning)1(2)作业1解答(SupervisedLearning)5(3)作业2(Kernels,SVMs,andTheory)15(4)作业2解答(Kernels,SVMs,andTheory)19(5)作业3(LearningTheoryandUnsupervisedLearning)27(6)作业3解答(LearningTheoryandUnsupervisedLearning)31(7)作业4(UnsupervisedLearningandReinforcementLearning)39(8)作业4解答(UnsupervisedLearningandReinforcementLearning)44(9)ProblemSet#1:SupervisedLearning56(10)ProblemSet#1Answer62(11)ProblemSet#2:ProblemSet#2:NaiveBayes,SVMs,andTheory78(12)ProblemSet#2Answer85CS229ProblemSet#11CS229,PublicCourseProblemSet#1:SupervisedLearning1.Newton’smethodforcomputingleastsquaresInthisproblem,wewillprovethatifweuseNewton’smethodsolvetheleastsquaresoptimizationproblem,thenweonlyneedoneiterationtoconvergetoθ∗.(a)FindtheHessianofthecostfunctionJ(θ)=12Pmi=1(θTx(i)−y(i))2.(b)ShowthatthefirstiterationofNewton’smethodgivesusθ⋆=(XTX)−1XT~y,thesolutiontoourleastsquaresproblem.2.Locally-weightedlogisticregressionInthisproblemyouwillimplementalocally-weightedversionoflogisticregression,whereweweightdifferenttrainingexamplesdifferentlyaccordingtothequerypoint.Thelocally-weightedlogisticregressionproblemistomaximizeℓ(θ)=−λ2θTθ+mXi=1w(i)hy(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))i.The−λ2θTθhereiswhatisknownasaregularizationparameter,whichwillbediscussedinafuturelecture,butwhichweincludeherebecauseitisneededforNewton’smethodtoperformwellonthistask.Fortheentiretyofthisproblemyoucanusethevalueλ=0.0001.Usingthisdefinition,thegradientofℓ(θ)isgivenby∇θℓ(θ)=XTz−λθwherez∈Rmisdefinedbyzi=w(i)(y(i)−hθ(x(i)))andtheHessianisgivenbyH=XTDX−λIwhereD∈Rm×misadiagonalmatrixwithDii=−w(i)hθ(x(i))(1−hθ(x(i)))Forthesakeofthisproblemyoucanjustusetheaboveformulas,butyoushouldtrytoderivetheseresultsforyourselfaswell.Givenaquerypointx,wechoosecomputetheweightsw(i)=exp−||x−x(i)||22τ2.Muchlikethelocallyweightedlinearregressionthatwasdiscussedinclass,thisweightingschemegivesmorewhenthe“nearby”pointswhenpredictingtheclassofanewexample.CS229ProblemSet#12(a)ImplementtheNewton-Raphsonalgorithmforoptimizingℓ(θ)foranewquerypointx,andusethistopredicttheclassofx.Theq2/directorycontainsdataandcodeforthisproblem.Youshouldimplementthey=lwlr(Xtrain,ytrain,x,tau)functioninthelwlr.mfile.Thisfunc-tiontakesasinputthetrainingset(theXtrainandytrainmatrices,intheformdescribedintheclassnotes),anewquerypointxandtheweightbandwitdhtau.Giventhisinputthefunctionshould1)computeweightsw(i)foreachtrainingexam-ple,usingtheformulaabove,2)maximizeℓ(θ)usingNewton’smethod,andfinally3)outputy=1{hθ(x)0.5}astheprediction.Weprovidetwoadditionalfunctionsthatmighthelp.The[Xtrain,ytrain]=loaddata;functionwillloadthematricesfromfilesinthedata/folder.Thefunc-tionplotlwlr(Xtrain,ytrain,tau,resolution)willplottheresultingclas-sifier(assumingyouhaveproperlyimplementedlwlr.m).Thisfunctionevaluatesthelocallyweightedlogisticregressionclassifieroveralargegridofpointsandplotstheresultingpredictionasblue(predictingy=0)orred(predictingy=1).Dependingonhowfastyourlwlrfunctionis,creatingtheplotmighttakesometime,sowerecommenddebuggingyourcodewithresolution=50;andlaterincreaseittoatleast200togetabetterideaofthedecisionboundary.(b)Evaluatethesystemwithavarietyofdifferentbandwidthparametersτ.Inparticular,tryτ=0.01,0.050.1,0.51.0,5.0.Howdoestheclassificationboundarychangewhenvaryingthisparameter?Canyoupredictwhatthedecisionboundaryofordinary(unweighted)logisticregressionwouldlooklike?3.MultivariateleastsquaresSofarinclass,wehaveonlyconsideredcaseswhereourtargetvariableyisascalarvalue.Supposethatinsteadoftryingtopredictasingleoutput,wehaveatrainingsetwithmultipleoutputsforeachexample:{(x(i),y(i)),i=1,...,m},x(i)∈Rn,y(i)∈Rp.Thusforeachtrainingexample,y(i)isvector-valued,withpentries.Wewishtousealinearmodeltopredicttheoutputs,asinleastsquares,byspecifyingtheparametermatrixΘiny=ΘTx,whereΘ∈Rn×p.(a)ThecostfunctionforthiscaseisJ(Θ)=12mXi=1pXj=1(ΘTx(i))j−y(i)j2.WriteJ(Θ)inmatrix-vectornotation(i.e.,withoutusinganysummations).[Hint:Startwiththem×ndesignmatrixX=—(x(1))T——(x(2))T—...—(x(m))T—CS229ProblemSet#13andthem×ptargetmatrixY=—(y(1))T——(y(2))T—...—(y(m))T—andthenworkouthowtoexpressJ(Θ)intermsofthesematrices.](b)FindtheclosedformsolutionforΘwhichminimizesJ(Θ).Thisistheequivalenttothenormalequationsforthemultivariatecase.(c)Supposeinsteadofconsideringthemultivariatevectorsy(i)allatonce,weinsteadcomputeeachvariabley(i)jseparatelyforeachj=1,...,p.Inthiscase,wehaveapindividuallinearmodels,oftheformy(i)j=θTjx(i),j=1,...,p.(Sohere,eachθj∈Rn).Howdotheparametersfromthesepindependentleastsquaresproblemscomparetothemultivariatesolution?4.NaiveBayesInthisproblem,welookatmaximumlikelihoodparameterestimationusingthenaiveBayesassumption.Here,theinputfeaturesxj,j=1,...,ntoourmodelarediscrete,binary-valuedvariables,soxj∈{0,1}.Wecallx=[x1x2···xn]Ttobetheinputvector.Foreachtrainingexample,ouroutputtargetsareasinglebinary-valuey∈{0,1}.Ourmodelisthenparameterizedbyφj|y=0=p(xj=1|y=0),φj|y=1=p(xj=1|y=1),andφy=p(y=1).Wemodelthejointdistributionof(x,y)accordingtop(y)=(φy)y(1−φy)1−yp(x|y=0)=nYj=1p(xj|y=0)=nYj=1(φj|y
本文标题:斯坦福大学机器学习所有问题及答案合集
链接地址:https://www.777doc.com/doc-1785525 .html