您好,欢迎访问三七文档
©DengCai,CollegeofComputerScience,ZhejiangUniversityNetworkRepresentationofaClassifier©DengCai,CollegeofComputerScience,ZhejiangUniversityObjectiveFunctionGivenexamplepairsx𝒊,𝑦𝑖Learnafunction𝑓x,𝑓x=𝒘𝑇𝒙Loss:𝐿𝑦,𝑓𝒙ExpectedLoss:E𝐿=𝐿𝑦,𝑓𝒙𝑝𝒙,𝑦𝑑𝒙𝑑𝑦DifferentalgorithmsdesigndifferentlossfunctionsLeastsquares(regression)SVM1©DengCai,CollegeofComputerScience,ZhejiangUniversityLinearDiscriminantFuncitonGiven𝒙𝒊,𝑦𝑖,𝑖=1,…,𝑛,(Trainingdata)Wearetryingtofind𝑔x=𝒘𝑇𝒙Regression:Minimizethemean-squarederror(regularizer):𝐽=1𝑛(𝑦𝑖−𝑔𝒙𝑖)2𝑛𝑖=1+𝜆𝒘2SVM:LargeMargin:min𝒘,𝑏12𝒘2𝑦𝑖𝒘𝑇𝒙𝑖+𝑏≥1©DengCai,CollegeofComputerScience,ZhejiangUniversityMinimizetheNumberofMisclassificationsErrorfunction:numberofmisclassifications𝐸𝒂=sgn𝒂𝑇𝒙𝑖−𝑦𝑖2𝑖Apiecewiseconstantfunctionof𝒂withdiscontinuitiesDifficulttooptimizeandnoclosedformsolutionForany𝒙𝑖,wehave𝒂𝑇𝒙𝑖𝑦𝑖0𝐸𝒂=−𝒂𝑇𝒙𝑖𝑦𝑖𝑖©DengCai,CollegeofComputerScience,ZhejiangUniversityThePerceptronCriterionFunctionGoal:Findaweightvectorasuchthataty0foralltheexamples(assumingitexists).PerceptronCriterionFunction:Minimizationismathematicallytractable,andhenceitisabettercriterionfunctionthannumberofmisclassifications.ThecriterionisproportionaltothesumofdistancesfromthemisclassifiedsamplestothedecisionboundaryFindanathatminimizesthiscriterion.©DengCai,CollegeofComputerScience,ZhejiangUniversityPerceptronFrankRosenblatt(1928–1969),ComputerScientistBorninNewYork,Bachelor:Cornell5FrankRossenblattwithMARK1in1958©DengCai,CollegeofComputerScience,ZhejiangUniversityGradientDescent6©DengCai,CollegeofComputerScience,ZhejiangUniversityGradientDescentPerceptroncriterionfunctionJ(a)isascalarfunctionofvectorvariables.Thiscanbeminimizedusinggradientdescent.GradientDescentProcedure:Startwithanarbitrarilychosenweightvectora(1)ComputethegradientvectorThenextvaluea(2)isobtainedbymovinginthedirectionofthesteepestdescent,i.e.alongthenegativeofthegradient.Ingeneral,thek+1-thsolutionisobtainedby))1(a(J©DengCai,CollegeofComputerScience,ZhejiangUniversityGradientDescentAlgorithmL(a)aa4a3a2a1©DengCai,CollegeofComputerScience,ZhejiangUniversityOptimizationAlgorithmUsegradientdescenttofindaMoveinthenegativedirectionofthegradientiterativelytoreachtheminima.Thegradientvectorisgivenby,Startingfroma=0,updateaateachiterationkasfollows:UpdatedaCurrentaDirectionofupdateLearningrate/Stepsize:Magnitudeofupdate©DengCai,CollegeofComputerScience,ZhejiangUniversityOptimizationAlgorithm©DengCai,CollegeofComputerScience,ZhejiangUniversityBatchPerceptronUsesallthesamplestocomputethegradientdirection.Iterativeupdatestep.Noticethegradient.StoppingCriterion©DengCai,CollegeofComputerScience,ZhejiangUniversityBatchlearningvs.OnlinelearningExpectedLoss:E𝐿=𝐿𝑦,𝑓𝒙𝑝𝒙,𝑦𝑑𝒙𝑑𝑦Empiricallossℒ=1𝑛𝐿𝑦,𝑓𝒙𝑛𝑖=1BatchlearningAllthetrainingsamplesareavailableOnlinelearningThelearningalgorithmseesthetrainingsamplesonebyone.©DengCai,CollegeofComputerScience,ZhejiangUniversityOnlineLearningFrameworkInitialization𝑓0𝒙Thealgorithmworksinrounds𝑡=1,2,3⋯Onroundt,thealgorithm:1.Receiveinstance𝒙𝑡2.Outputaprediction𝑦𝑡=𝑓𝑡𝒙𝑡3.Receivethereallabel𝑦𝑡4.Sufferloose𝑙𝑦𝑡,𝑦𝑡5.Updatethepredictionrule𝑓𝑡+1←𝑓𝑡Goal:minimizecumulativeloss𝑙𝑦𝑡,𝑦𝑡𝑡©DengCai,CollegeofComputerScience,ZhejiangUniversityFixed-incrementSingleSamplePerceptronComputesthegradientusingasinglesampleAlsocalledperceptronlearninginanonlinesettingForlargedatasets,thisismuchmoreefficientcomparedtobatchdescent(O(n)vsO(1)gradientcomputationinbatchvssingle-sample)Iterativeupdatestep.Noticethegradient.©DengCai,CollegeofComputerScience,ZhejiangUniversityGeometricalInterpretation15𝒙𝑡𝒘𝑡𝒘𝑡+1𝑦𝑡𝒙𝑡©DengCai,CollegeofComputerScience,ZhejiangUniversityPerceptronConvergencePerceptronConvergenceTheorem:Ifthereexistasetofweightsthatareconsistentwiththedata(I.e.,thedataislinearlyseparable)theperceptronlearningalgorithmwillconverge--Howlongwouldittaketoconverge?PerceptronCyclingTheorem:Ifthetrainingdataisnotlinearlyseparable,theperceptronlearningalgorithmwilleventuallyrepeatthesamesetofweightsandthereforeenteraninfiniteloop.--Howtoproviderobustness,moreexpressivity?©DengCai,CollegeofComputerScience,ZhejiangUniversityPerceptron:MistakeBoundTheoremMaintainsaweightvector𝒂∈ℛ𝑝,𝒂1=0,⋯,0.Uponreceivinganexample𝒙∈ℛ𝑝Predictsaccordingtothelinearthresholdfunctionsgn𝒂𝑇𝒙.Theorem[Novikoff,1963]Let𝒙1,𝑦1⋯𝒙𝑡,𝑦𝑡,beasequenceoflabeledexampleswith𝒙𝑖∈ℛ𝑝,𝒙𝑖≤𝑅andyi{-1,1}foralli.Let𝒖∈ℛ𝑝,0besuchthat𝑦𝑖𝒖𝑇𝒙𝑖≥𝛾foralli.ThenPerceptronmakesatmost𝒖2𝑅2𝛾2mistakesonthisexamplesequence.max𝑖𝒙𝑖2𝒖2min𝑖𝑦𝑖𝒖𝑇𝒙𝑖2MarginComplexityParameter©DengCai,CollegeofComputerScience,ZhejiangUniversityPerceptron-MistakeBoundProof:Let𝒖beanysolutionvector,sothat𝑦𝑖𝒖𝑇𝒙𝑖isstrictlypositiveforall𝑖,andlet𝛼beapositivescalefactor.𝒂𝑘+1−𝛼𝒖=𝒂𝑘−𝛼𝒖+𝑦𝑘𝒙𝑘𝒂𝑘+1−𝛼𝒖2=𝒂𝑘−𝛼𝒖2+2𝒂𝑘−𝛼𝒖𝑇𝑦𝑘𝒙𝑘+𝒙𝑘2𝒂𝑘+1−𝛼𝒖2≤𝒂𝑘−𝛼𝒖2−2𝛼𝒖𝑇𝑦𝑘𝒙𝑘+𝒙𝑘2𝒂𝑘+1−𝛼𝒖2≤𝒂𝑘−𝛼𝒖2−2𝛼𝛾+𝑅2Let𝛼=𝑅2𝛾𝒂𝑘+1−𝛼𝒖2≤𝒂1−𝛼𝒖2−𝑘𝑅2𝑘𝑚𝑎𝑥=𝒂1−𝛼𝒖2
本文标题:07-Online
链接地址:https://www.777doc.com/doc-4591624 .html