您好,欢迎访问三七文档
OntheConvergenceofSpectralClusteringonRandomSamples:theNormalizedCaseUlrikevonLuxburg1,OlivierBousquet1,andMikhailBelkin21MaxPlanckInstituteforBiologicalCybernetics,T¨ubingen,Germany{ulrike.luxburg,olivier.bousquet}@tuebingen.mpg.de2TheUniversityofChicago,DepartmentofComputerSciencemisha@cs.uchicago.eduAbstract.Givenasetofnrandomlydrawnsamplepoints,spectralclusteringinitssimplestformusesthesecondeigenvectorofthegraphLaplacianmatrix,constructedonthesimilaritygraphbetweenthesam-plepoints,toobtainapartitionofthesample.Weareinterestedinthequestionhowspectralclusteringbehavesforgrowingsamplesizen.IncaseoneusesthenormalizedgraphLaplacian,weshowthatspectralclus-teringusuallyconvergestoanintuitivelyappealinglimitpartitionofthedataspace.WearguethatincaseoftheunnormalizedgraphLaplacian,equallystrongconvergenceresultsaredifficulttoobtain.1IntroductionClusteringisawidelyusedtechniqueinmachinelearning.Givenasetofdatapoints,oneisinterestedinpartitioningthedatabasedonacertainsimilarityamongthedatapoints.Ifweassumethatthedataisdrawnfromsomeunderly-ingprobabilitydistribution,whichoftenseemstobethenaturalmathematicalframework,thegoalbecomestopartitiontheprobabilityspaceintocertainre-gionswithhighsimilarityamongpoints.Inthissettingtheproblemofclusteringistwo-fold:–Assumingthattheunderlyingprobabilitydistributionisknown,whatisadesirableclusteringofthedataspace?–Givenfinitelymanydatapointssampledfromanunknownprobabilitydis-tribution,howcanwereconstructthatoptimalpartitionempiricallyonthefinitesample?Interestingly,whileextensiveliteratureexistsonclusteringandpartitioning,tothebestofourknowledgeveryfewalgorithmshavebeenanalyzedorshowntoconvergeforincreasingsamplesize.Someexceptionsarethek-meansalgorithm(cf.Pollard,1981),thesinglelinkagealgorithm(cf.Hartigan,1981),andtheclusteringalgorithmsuggestedbyNiyogiandKarmarkar(2000).Thegoalofthispaperistoinvestigatethelimitbehaviorofaclassofspectralclusteringalgorithms.SpectralclusteringisapopulartechniquegoingbacktoDonathandHoffman(1973)andFiedler(1973).Ithasbeenusedforloadbalancing(VanDriesscheandRoose,1995),parallelcomputations(HendricksonandLeland,1995),andVLSIdesign(HagenandKahng,1992).Recently,Laplacian-basedclusteringal-gorithmshavefoundsuccessinapplicationstoimagesegmentation(cf.ShiandMalik,2000).MethodsbasedongraphLaplacianshavealsobeenusedforotherproblemsinmachinelearning,includingsemi-supervisedlearning(cf.BelkinandNiyogi,toappear;Zhuetal.,2003).Whiletheoreticalpropertiesofspectralclus-teringhavebeenstudied(e.g.,GuatteryandMiller(1998),Weiss(1999),Kannanetal.(2000),MeilaandShi(2001),alsoseeChung(1997)foracomprehensivetheoreticaltreatmentofthespectralgraphtheory),wedonotknowofanyre-sultsdiscussingtheconvergenceofspectralclusteringorthespectraofgraphLaplaciansforincreasingsamplesize.Howeverforkernelmatrices,theconver-genceoftheeigenvaluesandeigenvectorshasalreadyattractedsomeattention(cf.WilliamsandSeeger,2000;Shawe-Tayloretal.,2002;Bengioetal.,2003).2BackgroundandnotationsLet(X,dist)beametricspace,BtheBorelσ-algebraonX,Paprobabilitymeasureon(X,B),andL2(P):=L2(X,B,P)thespaceofsquare-integrablefunctions.Letk:X×X→IRameasurable,symmetric,non-negativefunc-tionthatcomputesthesimilaritybetweenpointsinX.ForgivensamplepointsX1,...,Xndrawniidaccordingtothe(unknown)distributionPwedenotetheempiricaldistributionbyPn.WedefinethesimilaritymatrixasKn:=(k(Xi,Xj))i,j=1,...,nandthedegreematrixDnasthediagonalmatrixwithdiag-onalentriesdi:=Pnj=1k(Xi,Xj).TheunnormalizeddiscreteLaplacianmatrixisdefinedasLn:=Dn−Kn.Forsymmetricandnon-negativek,Lnisapositivesemi-definitelinearoperatoronIRn.Leta=(a1,...,an)thesecondeigenvec-torofLn.Here,“secondeigenvector”referstotheeigenvectorbelongingtothesecondsmallesteigenvalue,wheretheeigenvaluesλ1≤λ2...≤λnarecountedwithmultiplicity.Inanutshell,spectralclusteringinitssimplesformpartitionsthesamplepoints(Xi)iintotwo(orseveral)groupsbythresholdingthesecondeigenvectorofLn:pointXibelongstocluster1ifaib,andtocluster2oth-erwise,whereb∈IRissomeappropriateconstant.AnintuitiveexplanationofwhythisworksisdiscussedinSection4.Often,spectralclusteringisalsoperformedwithanormalizedversionofthematrixLn.TwocommonwaysofnormalizingareL0n:=D−1/2nLnD−1/2norL00n:=D−1nLn.Theeigenvaluesandeigenvectorsofbothmatricesarecloselyrelated.DefinethenormalizedsimilaritymatricesH0n:=D−1/2nKnD−1/2nandH00n:=D−1nKn.ItcanbeseenbymultiplyingtheeigenvalueequationL0nv=λvfromleftwithD−1/2nthatv∈IRniseigenvectorofL0nwitheigenvalueλiffD−1/2nviseigenvectorofL00nwitheigenvalueλ.Furthermore,rearrangingtheeigenvalueequationsforL0nandL00nshowsthatv∈IRnisaneigenvectorofL0nwitheigenvalueλiffviseigenvectorofH0nwitheigenvalue(1−λ),andthatv∈IRnisaneigenvectorofL00nwitheigenvalueλiffviseigenvectorofH00nwitheigenvalue(1−λ).Thus,propertiesaboutthespectrumofoneofthema-tricesL0n,L00n,H0n,orH00ncanbereformulatedforthethreeothermatricesaswell.Inthefollowingwewanttorecallsomedefinitionsandfactsfrompertur-bationtheoryforboundedoperators.Thestandardreferenceforgeneralper-turbationtheoryisKato(1966),forperturbationtheoryinHilbertspaceswealsorec
本文标题:On the convergence of spectral clustering on rando
链接地址:https://www.777doc.com/doc-3316496 .html