您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据挖掘与识别 > 机器学习十大算法:SVM
Chapter3SVM:SupportVectorMachinesHuiXue,QiangYang,andSongcanChenContents3.1SupportVectorClassifier................................................373.2SVCwithSoftMarginandOptimization.................................413.3KernelTrick............................................................423.4TheoreticalFoundations.................................................473.5SupportVectorRegressor...............................................503.6SoftwareImplementations...............................................523.7CurrentandFutureResearch............................................523.7.1ComputationalEfficiency........................................523.7.2KernelSelection.................................................533.7.3GeneralizationAnalysis..........................................533.7.4StructuralSVMLearning........................................543.8Exercises...............................................................55References...................................................................56Supportvectormachines(SVMs),includingsupportvectorclassifier(SVC)andsup-portvectorregressor(SVR),areamongthemostrobustandaccuratemethodsinallwell-knowndataminingalgorithms.SVMs,whichwereoriginallydevelopedbyVapnikinthe1990s[1–11],haveasoundtheoreticalfoundationrootedinstatisti-callearningtheory,requireonlyasfewasadozenexamplesfortraining,andareofteninsensitivetothenumberofdimensions.Inthepastdecade,SVMshavebeendevelopedatafastpacebothintheoryandpractice.3.1SupportVectorClassifierForatwo-classlinearlyseparablelearningtask,theaimofSVCistofindahyperplanethatcanseparatetwoclassesofgivensampleswithamaximalmarginwhichhasbeenprovedabletoofferthebestgeneralizationability.Generalizationabilityreferstothefactthataclassifiernotonlyhasgoodclassificationperformance(e.g.,accuracy)onthetrainingdata,butalsoguaranteeshighpredictiveaccuracyforthefuturedatafromthesamedistributionasthetrainingdata.37©2009byTaylor&FrancisGroup,LLC38SVM:SupportVectorMachinesOptimalHyperplanewTx+b=0x2x1r*r*ρFigure3.1IllustrationoftheoptimalhyperplaneinSVCforalinearlyseparablecase.Intuitively,amargincanbedefinedastheamountofspace,orseparation,betweenthetwoclassesasdefinedbyahyperplane.Geometrically,themargincorrespondstotheshortestdistancebetweentheclosestdatapointstoanypointonthehyper-plane.Figure3.1illustratesageometricconstructionofthecorrespondingoptimalhyperplaneundertheaboveconditionsforatwo-dimensionalinputspace.Letwandbdenotetheweightvectorandbiasintheoptimalhyperplane,respec-tively.ThecorrespondinghyperplanecanbedefinedaswTx+b=0(3.1)Thedesireddirectionallygeometricaldistancefromthesamplextotheoptimalhyperplane[12,13]isr=g(x)w(3.2)whereg(x)=wTx+bisthediscriminantfunction[7]asdefinedbythehyperplaneandalsocalledx’sfunctionalmargingivenwandb.Consequently,SVCaimstofindtheparameterswandbforanoptimalhyperplaneinordertomaximizethemarginofseparation[ρinEquation(3.5)]thatisdeterminedbytheshortestgeometricaldistancesr∗fromthetwoclasses,respectively,thusSVCisalsocalledmaximalmarginclassifier.Nowwithoutlossofgenerality,wefixthefunctionalmargin[7]tobeequalto1;thatis,givenatrainingset{xi,yi}ni=1∈Rm×{±1},wehavewTxi+b≥1foryi=+1wTxi+b≤−1foryi=−1(3.3)©2009byTaylor&FrancisGroup,LLC3.1SupportVectorClassifier39Theparticulardatapoints(xi,yi)forwhichtheequalitiesofthefirstorsecondpartsinEquation(3.3)aresatisfiedarecalledsupportvectors,whichareexactlytheclosestdatapointstotheoptimalhyperplane[13].Then,thecorrespondinggeometricaldistancefromthesupportvectorx∗totheoptimalhyperplaneisr∗=g(x∗)w=⎧⎪⎪⎨⎪⎪⎩1wify∗=+1−1wify∗=−1(3.4)FromFigure3.1,clearlythemarginofseparationρisρ=2r∗=2w(3.5)Toensurethatthemaximummarginhyperplanecanbefound,SVCattemptstomaximizeρwithrespecttowandb:maxw,b2ws.t.yiwTxi+b≥1,i=1,...,n(3.6)Equivalently,minw,b12w2s.t.yi(wTxi+b)≥1,i=1,...,n(3.7)Here,weoftenusew2insteadofwfortheconvenienceofcarryingoutthesubsequentoptimizationsteps.Generally,wesolvetheconstrainedoptimizationprobleminEquation(3.7),knownastheprimalproblem,byusingthemethodofLagrangemultipliers.WeconstructthefollowingLagrangefunction:L(w,b,α)=12wTw−ni=1αi yiwTxi+b−1(3.8)whereαiistheLagrangemultiplierwithrespecttotheithinequality.DifferentiatingL(w,b,α)withrespecttowandb,andsettingtheresultsequaltozero,wegetthefollowingtwoconditionsofoptimality:⎧⎪⎪⎨⎪⎪⎩∂L(w,b,α)∂w=0∂L(w,b,α)∂b=0(3.9)©2009byTaylor&FrancisGroup,LLC40SVM:SupportVectorMachinesThenweobtain⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩w=ni=1αiyixini=1αiyi=0(3.10)SubstitutingEquation(3.10)intotheLagrangefunctionEquation(3.8),wecangetthecorrespondingdualproblem:maxαW(α)=ni=1αi−12ni=1nj=1αiαjyiyjxTixjs.t.ni=1αiyi=0αi≥0,i=1,...,n(3.11)Andatthesametime,theKarush-Kuhn-Tuckercomplementaryconditionisαi yiwTxi+b−1=0,i=1,...,n(3.12)Consequently,onlythesupportvectors(xi,yi)thataretheclosestdatapointstotheoptimalhyperplaneanddeterminethemaximalmargin,correspondtothenonzeroαis.Alltheotherαisequalzero.ThedualprobleminEquation(3.11)isatypicalconvexquadraticprogrammingoptimizationproblem.Inmanycases,itcanefficientlyconvergetotheglobaloptimumbyadoptingsomeappr
本文标题:机器学习十大算法:SVM
链接地址:https://www.777doc.com/doc-6332828 .html