您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 管理学资料 > 林智仁06年机器学习暑期学校讲义
SupportVectorMachinesChih-JenLinDepartmentofComputerScienceNationalTaiwanUniversityTalkatMachineLearningSummerSchool2006,TaipeiChih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei1/98OutlineBasicconceptsSVMprimal/dualproblemsTraininglinearandnonlinearSVMsParameter/kernelselectionandpracticalissuesMulti-classclassificationDiscussionandconclusionsChih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei2/98BasicconceptsOutlineBasicconceptsSVMprimal/dualproblemsTraininglinearandnonlinearSVMsParameter/kernelselectionandpracticalissuesMulti-classclassificationDiscussionandconclusionsChih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei3/98BasicconceptsWhySVMandKernelMethodsSVM:inmanycasescompetitivewithexistingclassificationmethodsRelativelyeasytouseKerneltechniques:manyextensionsRegression,densityestimation,kernelPCA,etc.Chih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei4/98BasicconceptsSupportVectorClassificationTrainingvectors:xi,i=1,...,lFeaturevectors.Forexample,Apatient=[height,weight,...]Considerasimplecasewithtwoclasses:Defineanindicatorvectoryyi=1ifxiinclass1−1ifxiinclass2,AhyperplanewhichseparatesalldataChih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei5/98BasicconceptswTx+b=h+10−1iAseparatinghyperplane:wTx+b=0(wTxi)+b0ifyi=1(wTxi)+b0ifyi=−1Decisionfunctionf(x)=sgn(wTx+b),x:testdataManypossiblechoicesofwandbChih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei6/98BasicconceptsMaximalMarginDistancebetweenwTx+b=1and−1:2/kwk=2/√wTwAquadraticprogrammingproblem[Boseretal.,1992]minw,b12wTwsubjecttoyi(wTxi+b)≥1,i=1,...,l.Chih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei7/98BasicconceptsDataMayNotBeLinearlySeparableAnexample:AllowtrainingerrorsHigherdimensional(maybeinfinite)featurespaceφ(x)=(φ1(x),φ2(x),...).Chih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei8/98BasicconceptsStandardSVM[CortesandVapnik,1995]minw,b,ξ12wTw+ClXi=1ξisubjecttoyi(wTφ(xi)+b)≥1−ξi,ξi≥0,i=1,...,l.Example:x∈R3,φ(x)∈R10φ(x)=(1,√2x1,√2x2,√2x3,x21,x22,x23,√2x1x2,√2x1x3,√2x2x3)Chih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei9/98BasicconceptsFindingtheDecisionFunctionw:maybeinfinitevariablesThedualproblemminα12αTQα−eTαsubjectto0≤αi≤C,i=1,...,lyTα=0,whereQij=yiyjφ(xi)Tφ(xj)ande=[1,...,1]TAtoptimumw=Pli=1αiyiφ(xi)Afiniteproblem:#variables=#trainingdataChih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei10/98BasicconceptsKernelTricksQij=yiyjφ(xi)Tφ(xj)needsaclosedformExample:xi∈R3,φ(xi)∈R10φ(xi)=(1,√2(xi)1,√2(xi)2,√2(xi)3,(xi)21,(xi)22,(xi)23,√2(xi)1(xi)2,√2(xi)1(xi)3,√2(xi)2(xi)3)Thenφ(xi)Tφ(xj)=(1+xTixj)2.Kernel:K(x,y)=φ(x)Tφ(y);commonkernels:e−γkxi−xjk2,(RadialBasisFunction)(xTixj/a+b)d(Polynomialkernel)Chih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei11/98BasicconceptsCanbeinnerproductininfinitedimensionalspaceAssumex∈R1andγ0.e−γkxi−xjk2=e−γ(xi−xj)2=e−γx2i+2γxixj−γx2j=e−γx2i−γx2j 1+2γxixj1!+(2γxixj)22!+(2γxixj)33!+···=e−γx2i−γx2j 1·1+r2γ1!xi·r2γ1!xj+r(2γ)22!x2i·r(2γ)22!x2j+r(2γ)33!x3i·r(2γ)33!x3j+···=φ(xi)Tφ(xj),whereφ(x)=e−γx21,r2γ1!x,r(2γ)22!x2,r(2γ)33!x3,···T.Chih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei12/98BasicconceptsMoreaboutKernelsHowdoweknowkernelshelptoseparatedata?InRl,anylindependentvectors⇒linearlyseparable(x1)T...(xl)Tw=+e−eIfKpositivedefinite⇒datalinearlyseparableK=LLT.TransformingtrainingpointstoindependentvectorsinRlChih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei13/98BasicconceptsSowhatkindofkernelshouldIuse?Whatkindoffunctionsarevalidkernels?Howtodecidekernelparameters?WillbediscussedlaterChih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei14/98BasicconceptsDecisionfunctionAtoptimumw=Pli=1αiyiφ(xi)DecisionfunctionwTφ(x)+b=lXi=1αiyiφ(xi)Tφ(x)+b=lXi=1αiyiK(xi,x)+bOnlyφ(xi)ofαi0used⇒supportvectorsChih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei15/98BasicconceptsSupportVectors:MoreImportantData−1.5−1−0.500.51−0.200.20.40.60.811.2Chih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei16/98BasicconceptsSowehaveroughlyshownbasicideasofSVMA3-Ddemonstration˜cjlin/libsvmtools/svmtoy3dFurtherreferences,forexample,[CristianiniandShawe-Taylor,2000,Sch¨olkopfandSmola,2002]Alsoseediscussiononkernelmachinesblackboard(NationalTaiwanUniv.)MLSS2006,Taipei17/98SVMprimal/dualproblemsOutlineBasicconceptsSVMprimal/dualproblemsTraininglinearandnonlinearSVMsParameter/kernelselectionandpracticalissuesMulti-classclassificationDiscussionandconclusionsChih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei18/98SVMprimal/dualproblemsDerivingtheDualConsidertheproblemwithoutξiminw,b12wTwsubjecttoyi(wTφ(xi)+b)≥1,i=1,...,l.Itsdualminα12αTQα−eTαsubjectto0≤αi,i=1,...,l,yTα=0.Chih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei19/98SVMprimal/dualproblemsLagrangianDualmaxα≥0 minw,bL(w,b,α),whereL(w,b,α)=12kwk2−lXi=1αi yi(wTφ(xi)+b)−1Strongduality(becarefulaboutthis)minPrimal=maxα≥0 minw,bL(w,b,α)Chih-JenLin(NationalTaiwanUniv.)MLSS2006,Taipei20/98SVMprimal/dualproblemsSimplifythedual.Whenαisfixed,minw,bL(w,b,α)=(−∞ifPli=1αiyi6=0,minw12wTw−Pli=1αi[yi(wTφ(xi)−1]ifPli=1αiyi=0.IfPli=1αiyi6=0,decrease−blXi=1αiyiinL(w,b,α)to−∞Chih-JenLin(NationalTaiwanUniv.)MLSS2006,Ta
本文标题:林智仁06年机器学习暑期学校讲义
链接地址:https://www.777doc.com/doc-4832229 .html