您好,欢迎访问三七文档
当前位置:首页 > 行业资料 > 能源与动力工程 > 2-1-Norm-Regularized-Discriminative-Feature
2,1-NormRegularizedDiscriminativeFeatureSelectionforUnsupervisedLearningYiYang1,HengTaoShen1,ZhigangMa2,ZiHuang1,XiaofangZhou11SchoolofInformationTechnology&ElectricalEngineering,TheUniversityofQueensland.2DepartmentofInformationEngineering&ComputerScience,UniversityofTrento.yangyizju@yahoo.com.cn,shenht@itee.uq.edu.au,ma@disi.unitn.it,{huang,zxf}@itee.uq.edu.au.AbstractComparedwithsupervisedlearningforfeatureselection,itismuchmoredifficulttoselectthediscriminativefeaturesinun-supervisedlearningduetothelackoflabelinformation.Traditionalunsuper-visedfeatureselectionalgorithmsusuallyselectthefeatureswhichbestpreservethedatadistribution,e.g.,manifoldstruc-ture,ofthewholefeatureset.Undertheassumptionthattheclasslabelofinputdatacanbepredictedbyalinearclassi-fier,weincorporatediscriminativeanal-ysisand2,1-normminimizationintoajointframeworkforunsupervisedfeatureselection.Differentfromexistingunsu-pervisedfeatureselectionalgorithms,ouralgorithmselectsthemostdiscriminativefeaturesubsetfromthewholefeaturesetinbatchmode.Extensiveexperimentondifferentdatatypesdemonstratestheef-fectivenessofouralgorithm.IntroductionInmanyareas,suchascomputervision,patternrecognitionandbiologicalstudy,dataarerepresentedbyhighdimen-sionalfeaturevectors.Featureselectionaimstoselectasub-setoffeaturesfromthehighdimensionalfeaturesetforacompactandaccuratedatarepresentation.Ithastwofoldroleinimprovingtheperformancefordataanalysis.First,thedimensionofselectedfeaturesubsetismuchlower,makingthesubsequentialcomputationontheinputdatamoreeffi-cient.Second,thenoisyfeaturesareeliminatedforabetterdatarepresentation,resultinginamoreaccurateclusteringandclassificationresult.Duringrecentyears,featureselec-tionhasattractedmuchresearchattention.Severalnewfea-tureselectionalgorithmshavebeenproposedwithavarietyofapplications.Featureselectionalgorithmscanberoughlyclassifiedintotwogroups,i.e.,supervisedfeatureselectionandunsuper-visedfeatureselection.Supervisedfeatureselectionalgo-rithms,e.g.,Fisherscore[Dudaetal.,2001],robustregres-sion[Nieetal.,2010],sparsemulti-outputregression[Zhaoetal.,2010]andtraceratio[Nieetal.,2008],usuallyselectfeaturesaccordingtolabelsofthetrainingdata.Becausedis-criminativeinformationisenclosedinlabels,supervisedfea-tureselectionisusuallyabletoselectdiscriminativefeatures.Inunsupervisedscenarios,however,thereisnolabelinforma-tiondirectlyavailable,makingitmuchmoredifficulttoselectthediscriminativefeatures.Afrequentlyusedcriterioninun-supervisedlearningistoselectthefeatureswhichbestpre-servethedatasimilarityormanifoldstructurederivedfromthewholefeatureset[Heetal.,2005;ZhaoandLiu,2007;Caietal.,2010].However,discriminativeinformationisne-glectedthoughithasbeendemonstratedimportantindataanalysis[Fukunaga,1990].Mostofthetraditionalsupervisedandunsupervisedfeatureselectionalgorithmsevaluatetheimportanceofeachfeatureindividually[Dudaetal.,2001;Heetal.,2005;ZhaoandLiu,2007]andselectfeaturesonebyone.Alimitationisthatthecorrelationamongfeaturesisneglected[Zhaoetal.,2010;Caietal.,2010].Morerecently,researchershaveappliedthetwo-stepapproach,i.e.,spectralregression,tosuper-visedandunsupervisedfeatureselection[Zhaoetal.,2010;Caietal.,2010].Theseeffortshaveshownthatitisabetterwaytoevaluatetheimportanceoftheselectedfea-turesjointly.Inthispaper,weproposeanewunsuper-visedfeatureselectionalgorithmbysimultaneouslyexploit-ingdiscriminativeinformationandfeaturecorrelations.Be-causeweutilizelocaldiscriminativeinformation,themani-foldstructureisconsideredtoo.While[Zhaoetal.,2010;Caietal.,2010]alsoselectfeaturesinbatchmode,oural-gorithmisaone-stepapproachanditisabletoselectthediscriminativefeaturesforunsupervisedlearning.Wealsoproposeanefficientalgorithmtooptimizetheproblem.TheObjectiveFunctionInthissection,wegivetheobjectivefunctionoftheproposedUnsupervisedDiscriminativeFeatureSelection(UDFS)algo-rithm.Laterinthenextsection,weproposeanefficientalgo-rithmtooptimizetheobjectivefunction.Itisworthmention-ingthatUDFSaimstoselectthemostdiscriminativefeaturesfordatarepresentation,wheremanifoldstructureisconsid-ered,makingitdifferentfromtheexistingunsupervisedfea-tureselectionalgorithms.DenoteX={x1,x2,...,xn}asthetrainingset,wherexi∈Rd(1≤i≤n)isthei-thdatumandnisthetotal1589ProceedingsoftheTwenty-SecondInternationalJointConferenceonArtificialIntelligencenumberoftrainingdata.Inthispaper,Iisidentitymatrix.Foraconstantm,1m∈Rmisacolumnvectorwithallofitselementsbeing1andHm=I−1m1m1Tm∈Rm×m.ForanarbitrarymatrixA∈Rr×p,its2,1-normisdefinedasA2,1=ri=1pj=1A2ij.(1)Supposethentrainingdatax1,x2,...,xnaresampledfromcclassesandtherearenisamplesinthei-thclass.Wedefineyi∈{0,1}c×1(1≤i≤n)asthelabelvectorofxi.Thej-thelementofyiis1ifxibelongstothej-thclass,and0otherwise.Y=[y1,y2,...,yn]T∈{0,1}n×cisthelabelmatrix.ThetotalscattermatrixStandbetweenclassscattermatrixSbaredefinedasfollows[Fukunaga,1990].St=ni=1(xi−μ)(xi−μ)T=˜X˜XT(2)Sb=ci=1ni(μi−μ)(μi−μ)T=˜XGGT˜XT(3)whereμisthemeanofallsamples,μiisthemeanofsamplesinthei-thclass,niisthenumberofsamplesinthei-thclass,˜X=XHnisthedatamatrixafterbeing
本文标题:2-1-Norm-Regularized-Discriminative-Feature
链接地址:https://www.777doc.com/doc-6716899 .html