您好,欢迎访问三七文档
InternationalJournalofApproximateReasoning53(2012)912–926ContentslistsavailableatSciVerseScienceDirectInternationalJournalofApproximateReasoningjournalhomepage:ficientroughfeatureselectionalgorithmwithamulti-granulationviewJiyeLianga,∗,FengWanga,b,ChuangyinDangb,YuhuaQianaaKeyLaboratoryofComputationalIntelligenceandChineseInformationProcessingofMinistryofEducation,SchoolofComputerandInformationTechnology,ShanxiUniversity,Taiyuan030006,Shanxi,ChinabDepartmentofSystemEngineeringandEngineeringManagement,CityUniversityofHongKong,HongKongARTICLEINFOABSTRACTArticlehistory:Received15April2011Receivedinrevisedform27February2012Accepted29February2012Availableonline13March2012Keywords:FeatureselectionMulti-granulationviewRoughsettheoryLarge-scaledatasetsFeatureselectionisachallengingprobleminmanyareassuchaspatternrecognition,ma-chinelearninganddatamining.Roughsettheory,asavalidsoftcomputingtooltoanalyzevarioustypesofdata,hasbeenwidelyappliedtoselecthelpfulfeatures(alsocalledattributereduction).Inroughsettheory,manyfeatureselectionalgorithmshavebeendevelopedintheliteratures,however,theyareverytime-consumingwhendatasetsareinalargescale.Toovercomethislimitation,weproposeinthispaperanefficientroughfeatureselectionalgo-rithmforlarge-scaledatasets,whichisstimulatedfrommulti-granulation.Asub-tableofadatasetcanbeconsideredasasmallgranularity.Givenalarge-scaledataset,thealgorithmfirstselectsdifferentsmallgranularitiesandthenestimateoneachsmallgranularitythereductoftheoriginaldataset.Fusingalloftheestimatesonsmallgranularitiestogether,thealgorithmcangetanapproximatereduct.Becauseofthatthetotaltimespentoncomputingreductsforsub-tablesismuchlessthanthatfortheoriginallarge-scaleone,thealgorithmyieldsinamuchlessamountoftimeafeaturesubset(theapproximatereduct).Accordingtoseveraldecisionperformancemeasures,experimentalresultsshowthattheproposedalgorithmisfeasibleandefficientforlarge-scaledatasets.©2012ElsevierInc.Allrightsreserved.1.IntroductionAsacommontechniquefordatapreprocessinginpatternrecognition,machinelearninganddatamining,featureselectionhasattractedmuchattentioninrecentyears[5–7,20,21,23,26,30,40].Inpractices,databasesincreasequicklynotonlyintherows(objects)butalsointhecolumn(features)nowadays.Tens,hundredseventhousandsoffeaturesarestoredindatabasesinsomereal-worldapplications,whichhasresultedindatawithhighdimension.However,onlyalimitedamountoffeaturesisusefulinpractice,thatis,anexcessiveamountoffeaturesmaycauseasignificantslowdowninthelearningprocessandirrelevantorredundantfeaturesmaydeterioratetheperformanceoflearningalgorithms[12,13,38].Toeasethissituation,itisdesirabletoreduceredundantfeaturesandselectinformativefeaturesfordecreasingthecostofmeasuring,storingandtransmitting,shorteningtheprocesstimeandgainingmorecompactclassificationmodelswithabettergeneralization.Roughsettheory,proposedbyPawlak[31–33],isarelativelynewsoftcomputingtoolfortheanalysisofavaguedescriptionofanobject,andhasbecomeapopularmathematicalframeworkforpatternrecognition,imageprocessing,featureselection,ruleextraction,neuro-computing,conflictanalysis,decisionsupporting,granularcomputing,dataminingandknowledgediscoveryfromlargedatasets[3,4,8,28,36,50,51].Inroughsettheory,animportantconceptisattributereduction(orapproximatereduct),whichcanbeconsideredakindofspecificfeatureselection.Inotherwords,basedonroughsettheory,∗Correspondingauthor.Tel./fax:+8603517018176.E-mailaddresses:ljy@sxu.edu.cn(J.Liang),sxuwangfeng@126.com(F.Wang),mecdang@cityu.edu.hk(C.Dang),jinchengqyh@126.com(Y.Qian).0888-613X/$-seefrontmatter©2012ElsevierInc.Allrightsreserved.(2012)912–926913onecanselectusefulfeaturesfromagivendatatable.Attributereductiondoesnotattempttomaximizetheclassseparabilitybutrathertoretainthediscernibleabilityoforiginalfeaturesfortheobjectsfromtheuniverse[15,16,41,44,52].Asoneofthemostimportantresearchtopicsalongwiththefastdevelopmentofroughsettheory,attributereductionhasarousedwideconcernandstudy,andmanyattributereductiontechniqueshavebeendevelopedinlasttwentyyears.Applyingdiscernibilitymatrix,Skowron[42]proposedanattributereductionalgorithmbycomputingdisjunctivenormalform,whichisabletoobtainallattributereductsofagiventablewhereasfindingtheminimalreductofadecisiontableisanNP-hardproblem.KryszkiewiczandLasek[22]proposedanapproachtocomputingtheminimalsetofattributesthatfunctionallydetermineadecisionattribute.Thesetwoattributereductionalgorithmsareusuallycomputationallyveryexpensive,especiallyfordealingwithlarge-scaledatasetsofhighdimensions.Therefore,toovercomethisdifficulty,manyheuristicattributereductionalgorithmshavebeendevelopedinroughsettheory[11,13,24,25,39,35,43,45,46,48].Aheuristicattributereductionalgorithmcanextractasinglereductfromagiventableinarelativelyshorttime.Inordertofurtherreducecomputationaltime,basedonfourkindsofcommonheuristicreductionalgorithms,Qianetal.[37]developedacommonacceleratortoimprovethetimeefficiencyofaheuristic
本文标题:an efficeitn rough feature selection algorithm wit
链接地址:https://www.777doc.com/doc-3328448 .html