斯坦福大学机器学习所有问题及答案合集

CS229机器学习(问题及答案)斯坦福大学目录(1)作业1（SupervisedLearning）1(2)作业1解答（SupervisedLearning）5(3)作业2（Kernels,SVMs,andTheory）15(4)作业2解答（Kernels,SVMs,andTheory）19(5)作业3（LearningTheoryandUnsupervisedLearning）27(6)作业3解答（LearningTheoryandUnsupervisedLearning）31(7)作业4（UnsupervisedLearningandReinforcementLearning）39(8)作业4解答（UnsupervisedLearningandReinforcementLearning）44(9)ProblemSet#1:SupervisedLearning56(10)ProblemSet#1Answer62(11)ProblemSet#2:ProblemSet#2:NaiveBayes,SVMs,andTheory78(12)ProblemSet#2Answer85CS229ProblemSet#11CS229,PublicCourseProblemSet#1:SupervisedLearning1.Newton’smethodforcomputingleastsquaresInthisproblem,wewillprovethatifweuseNewton’smethodsolvetheleastsquaresoptimizationproblem,thenweonlyneedoneiterationtoconvergetoθ∗.(a)FindtheHessianofthecostfunctionJ(θ)=12Pmi=1(θTx(i)−y(i))2.(b)ShowthattheﬁrstiterationofNewton’smethodgivesusθ⋆=(XTX)−1XT~y,thesolutiontoourleastsquaresproblem.2.Locally-weightedlogisticregressionInthisproblemyouwillimplementalocally-weightedversionoflogisticregression,whereweweightdiﬀerenttrainingexamplesdiﬀerentlyaccordingtothequerypoint.Thelocally-weightedlogisticregressionproblemistomaximizeℓ(θ)=−λ2θTθ+mXi=1w(i)hy(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))i.The−λ2θTθhereiswhatisknownasaregularizationparameter,whichwillbediscussedinafuturelecture,butwhichweincludeherebecauseitisneededforNewton’smethodtoperformwellonthistask.Fortheentiretyofthisproblemyoucanusethevalueλ=0.0001.Usingthisdeﬁnition,thegradientofℓ(θ)isgivenby∇θℓ(θ)=XTz−λθwherez∈Rmisdeﬁnedbyzi=w(i)(y(i)−hθ(x(i)))andtheHessianisgivenbyH=XTDX−λIwhereD∈Rm×misadiagonalmatrixwithDii=−w(i)hθ(x(i))(1−hθ(x(i)))Forthesakeofthisproblemyoucanjustusetheaboveformulas,butyoushouldtrytoderivetheseresultsforyourselfaswell.Givenaquerypointx,wechoosecomputetheweightsw(i)=exp−||x−x(i)||22τ2.Muchlikethelocallyweightedlinearregressionthatwasdiscussedinclass,thisweightingschemegivesmorewhenthe“nearby”pointswhenpredictingtheclassofanewexample.CS229ProblemSet#12(a)ImplementtheNewton-Raphsonalgorithmforoptimizingℓ(θ)foranewquerypointx,andusethistopredicttheclassofx.Theq2/directorycontainsdataandcodeforthisproblem.Youshouldimplementthey=lwlr(Xtrain,ytrain,x,tau)functioninthelwlr.mﬁle.Thisfunc-tiontakesasinputthetrainingset(theXtrainandytrainmatrices,intheformdescribedintheclassnotes),anewquerypointxandtheweightbandwitdhtau.Giventhisinputthefunctionshould1)computeweightsw(i)foreachtrainingexam-ple,usingtheformulaabove,2)maximizeℓ(θ)usingNewton’smethod,andﬁnally3)outputy=1{hθ(x)0.5}astheprediction.Weprovidetwoadditionalfunctionsthatmighthelp.The[Xtrain,ytrain]=loaddata;functionwillloadthematricesfromﬁlesinthedata/folder.Thefunc-tionplotlwlr(Xtrain,ytrain,tau,resolution)willplottheresultingclas-siﬁer(assumingyouhaveproperlyimplementedlwlr.m).Thisfunctionevaluatesthelocallyweightedlogisticregressionclassiﬁeroveralargegridofpointsandplotstheresultingpredictionasblue(predictingy=0)orred(predictingy=1).Dependingonhowfastyourlwlrfunctionis,creatingtheplotmighttakesometime,sowerecommenddebuggingyourcodewithresolution=50;andlaterincreaseittoatleast200togetabetterideaofthedecisionboundary.(b)Evaluatethesystemwithavarietyofdiﬀerentbandwidthparametersτ.Inparticular,tryτ=0.01,0.050.1,0.51.0,5.0.Howdoestheclassiﬁcationboundarychangewhenvaryingthisparameter?Canyoupredictwhatthedecisionboundaryofordinary(unweighted)logisticregressionwouldlooklike?3.MultivariateleastsquaresSofarinclass,wehaveonlyconsideredcaseswhereourtargetvariableyisascalarvalue.Supposethatinsteadoftryingtopredictasingleoutput,wehaveatrainingsetwithmultipleoutputsforeachexample:{(x(i),y(i)),i=1,...,m},x(i)∈Rn,y(i)∈Rp.Thusforeachtrainingexample,y(i)isvector-valued,withpentries.Wewishtousealinearmodeltopredicttheoutputs,asinleastsquares,byspecifyingtheparametermatrixΘiny=ΘTx,whereΘ∈Rn×p.(a)ThecostfunctionforthiscaseisJ(Θ)=12mXi=1pXj=1(ΘTx(i))j−y(i)j2.WriteJ(Θ)inmatrix-vectornotation(i.e.,withoutusinganysummations).[Hint:Startwiththem×ndesignmatrixX=—(x(1))T——(x(2))T—...—(x(m))T—CS229ProblemSet#13andthem×ptargetmatrixY=—(y(1))T——(y(2))T—...—(y(m))T—andthenworkouthowtoexpressJ(Θ)intermsofthesematrices.](b)FindtheclosedformsolutionforΘwhichminimizesJ(Θ).Thisistheequivalenttothenormalequationsforthemultivariatecase.(c)Supposeinsteadofconsideringthemultivariatevectorsy(i)allatonce,weinsteadcomputeeachvariabley(i)jseparatelyforeachj=1,...,p.Inthiscase,wehaveapindividuallinearmodels,oftheformy(i)j=θTjx(i),j=1,...,p.(Sohere,eachθj∈Rn).Howdotheparametersfromthesepindependentleastsquaresproblemscomparetothemultivariatesolution?4.NaiveBayesInthisproblem,welookatmaximumlikelihoodparameterestimationusingthenaiveBayesassumption.Here,theinputfeaturesxj,j=1,...,ntoourmodelarediscrete,binary-valuedvariables,soxj∈{0,1}.Wecallx=[x1x2···xn]Ttobetheinputvector.Foreachtrainingexample,ouroutputtargetsareasinglebinary-valuey∈{0,1}.Ourmodelisthenparameterizedbyφj|y=0=p(xj=1|y=0),φj|y=1=p(xj=1|y=1),andφy=p(y=1).Wemodelthejointdistributionof(x,y)accordingtop(y)=(φy)y(1−φy)1−yp(x|y=0)=nYj=1p(xj|y=0)=nYj=1(φj|y

斯坦福大学机器学习所有问题及答案合集

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

电子政务教程第六章

税务行业信息化建设解决方案(1)

银行网络数据中心设计方案

中高层管理者法律实务培训

奥康集团计划采购科残鞋统计岗位说明书

五里乡林改工作总结

专项风险评估

品牌领导(1)

精神领袖与企业团队管理

乡村兽医培训材料

相关文档

相关搜索