2-1-Norm-Regularized-Discriminative-Feature

2,1-NormRegularizedDiscriminativeFeatureSelectionforUnsupervisedLearningYiYang1,HengTaoShen1,ZhigangMa2,ZiHuang1,XiaofangZhou11SchoolofInformationTechnology&ElectricalEngineering,TheUniversityofQueensland.2DepartmentofInformationEngineering&ComputerScience,UniversityofTrento.yangyizju@yahoo.com.cn,shenht@itee.uq.edu.au,ma@disi.unitn.it,{huang,zxf}@itee.uq.edu.au.AbstractComparedwithsupervisedlearningforfeatureselection,itismuchmoredifﬁculttoselectthediscriminativefeaturesinun-supervisedlearningduetothelackoflabelinformation.Traditionalunsuper-visedfeatureselectionalgorithmsusuallyselectthefeatureswhichbestpreservethedatadistribution,e.g.,manifoldstruc-ture,ofthewholefeatureset.Undertheassumptionthattheclasslabelofinputdatacanbepredictedbyalinearclassi-ﬁer,weincorporatediscriminativeanal-ysisand2,1-normminimizationintoajointframeworkforunsupervisedfeatureselection.Differentfromexistingunsu-pervisedfeatureselectionalgorithms,ouralgorithmselectsthemostdiscriminativefeaturesubsetfromthewholefeaturesetinbatchmode.Extensiveexperimentondifferentdatatypesdemonstratestheef-fectivenessofouralgorithm.IntroductionInmanyareas,suchascomputervision,patternrecognitionandbiologicalstudy,dataarerepresentedbyhighdimen-sionalfeaturevectors.Featureselectionaimstoselectasub-setoffeaturesfromthehighdimensionalfeaturesetforacompactandaccuratedatarepresentation.Ithastwofoldroleinimprovingtheperformancefordataanalysis.First,thedimensionofselectedfeaturesubsetismuchlower,makingthesubsequentialcomputationontheinputdatamoreefﬁ-cient.Second,thenoisyfeaturesareeliminatedforabetterdatarepresentation,resultinginamoreaccurateclusteringandclassiﬁcationresult.Duringrecentyears,featureselec-tionhasattractedmuchresearchattention.Severalnewfea-tureselectionalgorithmshavebeenproposedwithavarietyofapplications.Featureselectionalgorithmscanberoughlyclassiﬁedintotwogroups,i.e.,supervisedfeatureselectionandunsuper-visedfeatureselection.Supervisedfeatureselectionalgo-rithms,e.g.,Fisherscore[Dudaetal.,2001],robustregres-sion[Nieetal.,2010],sparsemulti-outputregression[Zhaoetal.,2010]andtraceratio[Nieetal.,2008],usuallyselectfeaturesaccordingtolabelsofthetrainingdata.Becausedis-criminativeinformationisenclosedinlabels,supervisedfea-tureselectionisusuallyabletoselectdiscriminativefeatures.Inunsupervisedscenarios,however,thereisnolabelinforma-tiondirectlyavailable,makingitmuchmoredifﬁculttoselectthediscriminativefeatures.Afrequentlyusedcriterioninun-supervisedlearningistoselectthefeatureswhichbestpre-servethedatasimilarityormanifoldstructurederivedfromthewholefeatureset[Heetal.,2005;ZhaoandLiu,2007;Caietal.,2010].However,discriminativeinformationisne-glectedthoughithasbeendemonstratedimportantindataanalysis[Fukunaga,1990].Mostofthetraditionalsupervisedandunsupervisedfeatureselectionalgorithmsevaluatetheimportanceofeachfeatureindividually[Dudaetal.,2001;Heetal.,2005;ZhaoandLiu,2007]andselectfeaturesonebyone.Alimitationisthatthecorrelationamongfeaturesisneglected[Zhaoetal.,2010;Caietal.,2010].Morerecently,researchershaveappliedthetwo-stepapproach,i.e.,spectralregression,tosuper-visedandunsupervisedfeatureselection[Zhaoetal.,2010;Caietal.,2010].Theseeffortshaveshownthatitisabetterwaytoevaluatetheimportanceoftheselectedfea-turesjointly.Inthispaper,weproposeanewunsuper-visedfeatureselectionalgorithmbysimultaneouslyexploit-ingdiscriminativeinformationandfeaturecorrelations.Be-causeweutilizelocaldiscriminativeinformation,themani-foldstructureisconsideredtoo.While[Zhaoetal.,2010;Caietal.,2010]alsoselectfeaturesinbatchmode,oural-gorithmisaone-stepapproachanditisabletoselectthediscriminativefeaturesforunsupervisedlearning.Wealsoproposeanefﬁcientalgorithmtooptimizetheproblem.TheObjectiveFunctionInthissection,wegivetheobjectivefunctionoftheproposedUnsupervisedDiscriminativeFeatureSelection(UDFS)algo-rithm.Laterinthenextsection,weproposeanefﬁcientalgo-rithmtooptimizetheobjectivefunction.Itisworthmention-ingthatUDFSaimstoselectthemostdiscriminativefeaturesfordatarepresentation,wheremanifoldstructureisconsid-ered,makingitdifferentfromtheexistingunsupervisedfea-tureselectionalgorithms.DenoteX={x1,x2,...,xn}asthetrainingset,wherexi∈Rd(1≤i≤n)isthei-thdatumandnisthetotal1589ProceedingsoftheTwenty-SecondInternationalJointConferenceonArtificialIntelligencenumberoftrainingdata.Inthispaper,Iisidentitymatrix.Foraconstantm,1m∈Rmisacolumnvectorwithallofitselementsbeing1andHm=I−1m1m1Tm∈Rm×m.ForanarbitrarymatrixA∈Rr×p,its2,1-normisdeﬁnedasA2,1=ri=1pj=1A2ij.(1)Supposethentrainingdatax1,x2,...,xnaresampledfromcclassesandtherearenisamplesinthei-thclass.Wedeﬁneyi∈{0,1}c×1(1≤i≤n)asthelabelvectorofxi.Thej-thelementofyiis1ifxibelongstothej-thclass,and0otherwise.Y=[y1,y2,...,yn]T∈{0,1}n×cisthelabelmatrix.ThetotalscattermatrixStandbetweenclassscattermatrixSbaredeﬁnedasfollows[Fukunaga,1990].St=ni=1(xi−μ)(xi−μ)T=˜X˜XT(2)Sb=ci=1ni(μi−μ)(μi−μ)T=˜XGGT˜XT(3)whereμisthemeanofallsamples,μiisthemeanofsamplesinthei-thclass,niisthenumberofsamplesinthei-thclass,˜X=XHnisthedatamatrixafterbeing

2-1-Norm-Regularized-Discriminative-Feature

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

集中采购管理实施细则

以市场为导向的新产品开发流程管理体系研究——基于门径管理方法的应用

CreditMetrics模型及其在我国商业银行信用风险管理中的

第五章审计学风险评估和风险应对3

XXXX年2月中国液晶电视市场分析报告(简版)

宾馆(饭店)安全生产事故应急预案（DOC 22页）(2)

航线运价投(招)标书

人事课设

第五章结构化面试

中国会计教育改革30年评价成就、问题与对策

相关文档

相关搜索