an efficeitn rough feature selection algorithm wit

InternationalJournalofApproximateReasoning53(2012)912–926ContentslistsavailableatSciVerseScienceDirectInternationalJournalofApproximateReasoningjournalhomepage:ﬁcientroughfeatureselectionalgorithmwithamulti-granulationviewJiyeLianga,∗,FengWanga,b,ChuangyinDangb,YuhuaQianaaKeyLaboratoryofComputationalIntelligenceandChineseInformationProcessingofMinistryofEducation,SchoolofComputerandInformationTechnology,ShanxiUniversity,Taiyuan030006,Shanxi,ChinabDepartmentofSystemEngineeringandEngineeringManagement,CityUniversityofHongKong,HongKongARTICLEINFOABSTRACTArticlehistory:Received15April2011Receivedinrevisedform27February2012Accepted29February2012Availableonline13March2012Keywords:FeatureselectionMulti-granulationviewRoughsettheoryLarge-scaledatasetsFeatureselectionisachallengingprobleminmanyareassuchaspatternrecognition,ma-chinelearninganddatamining.Roughsettheory,asavalidsoftcomputingtooltoanalyzevarioustypesofdata,hasbeenwidelyappliedtoselecthelpfulfeatures(alsocalledattributereduction).Inroughsettheory,manyfeatureselectionalgorithmshavebeendevelopedintheliteratures,however,theyareverytime-consumingwhendatasetsareinalargescale.Toovercomethislimitation,weproposeinthispaperanefﬁcientroughfeatureselectionalgo-rithmforlarge-scaledatasets,whichisstimulatedfrommulti-granulation.Asub-tableofadatasetcanbeconsideredasasmallgranularity.Givenalarge-scaledataset,thealgorithmﬁrstselectsdifferentsmallgranularitiesandthenestimateoneachsmallgranularitythereductoftheoriginaldataset.Fusingalloftheestimatesonsmallgranularitiestogether,thealgorithmcangetanapproximatereduct.Becauseofthatthetotaltimespentoncomputingreductsforsub-tablesismuchlessthanthatfortheoriginallarge-scaleone,thealgorithmyieldsinamuchlessamountoftimeafeaturesubset(theapproximatereduct).Accordingtoseveraldecisionperformancemeasures,experimentalresultsshowthattheproposedalgorithmisfeasibleandefﬁcientforlarge-scaledatasets.©2012ElsevierInc.Allrightsreserved.1.IntroductionAsacommontechniquefordatapreprocessinginpatternrecognition,machinelearninganddatamining,featureselectionhasattractedmuchattentioninrecentyears[5–7,20,21,23,26,30,40].Inpractices,databasesincreasequicklynotonlyintherows(objects)butalsointhecolumn(features)nowadays.Tens,hundredseventhousandsoffeaturesarestoredindatabasesinsomereal-worldapplications,whichhasresultedindatawithhighdimension.However,onlyalimitedamountoffeaturesisusefulinpractice,thatis,anexcessiveamountoffeaturesmaycauseasigniﬁcantslowdowninthelearningprocessandirrelevantorredundantfeaturesmaydeterioratetheperformanceoflearningalgorithms[12,13,38].Toeasethissituation,itisdesirabletoreduceredundantfeaturesandselectinformativefeaturesfordecreasingthecostofmeasuring,storingandtransmitting,shorteningtheprocesstimeandgainingmorecompactclassiﬁcationmodelswithabettergeneralization.Roughsettheory,proposedbyPawlak[31–33],isarelativelynewsoftcomputingtoolfortheanalysisofavaguedescriptionofanobject,andhasbecomeapopularmathematicalframeworkforpatternrecognition,imageprocessing,featureselection,ruleextraction,neuro-computing,conﬂictanalysis,decisionsupporting,granularcomputing,dataminingandknowledgediscoveryfromlargedatasets[3,4,8,28,36,50,51].Inroughsettheory,animportantconceptisattributereduction(orapproximatereduct),whichcanbeconsideredakindofspeciﬁcfeatureselection.Inotherwords,basedonroughsettheory,∗Correspondingauthor.Tel./fax:+8603517018176.E-mailaddresses:ljy@sxu.edu.cn(J.Liang),sxuwangfeng@126.com(F.Wang),mecdang@cityu.edu.hk(C.Dang),jinchengqyh@126.com(Y.Qian).0888-613X/$-seefrontmatter©2012ElsevierInc.Allrightsreserved.(2012)912–926913onecanselectusefulfeaturesfromagivendatatable.Attributereductiondoesnotattempttomaximizetheclassseparabilitybutrathertoretainthediscernibleabilityoforiginalfeaturesfortheobjectsfromtheuniverse[15,16,41,44,52].Asoneofthemostimportantresearchtopicsalongwiththefastdevelopmentofroughsettheory,attributereductionhasarousedwideconcernandstudy,andmanyattributereductiontechniqueshavebeendevelopedinlasttwentyyears.Applyingdiscernibilitymatrix,Skowron[42]proposedanattributereductionalgorithmbycomputingdisjunctivenormalform,whichisabletoobtainallattributereductsofagiventablewhereasﬁndingtheminimalreductofadecisiontableisanNP-hardproblem.KryszkiewiczandLasek[22]proposedanapproachtocomputingtheminimalsetofattributesthatfunctionallydetermineadecisionattribute.Thesetwoattributereductionalgorithmsareusuallycomputationallyveryexpensive,especiallyfordealingwithlarge-scaledatasetsofhighdimensions.Therefore,toovercomethisdifﬁculty,manyheuristicattributereductionalgorithmshavebeendevelopedinroughsettheory[11,13,24,25,39,35,43,45,46,48].Aheuristicattributereductionalgorithmcanextractasinglereductfromagiventableinarelativelyshorttime.Inordertofurtherreducecomputationaltime,basedonfourkindsofcommonheuristicreductionalgorithms,Qianetal.[37]developedacommonacceleratortoimprovethetimeefﬁciencyofaheuristic

an efficeitn rough feature selection algorithm wit

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

高中数学自动化测试客户端使用手册

小学信息技术评价方案

苏州固锝电子股份有限公司重大信息内部通报制度

安全系统工程课设计-唐健涛530

现代装修材料论文

关于煤矿装备安全监控系统项目公开征求意见公告

第03讲货币的时间价值与利率

国际危机组织

团队建设培训PPT（PPT36页)

某公司采购部原材料采购岗位说明书

相关文档

相关搜索

an efficeitn rough feature selection algorithm wit

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

高中数学自动化测试客户端使用手册

小学信息技术评价方案

苏州固锝电子股份有限公司重大信息内部通报制度

安全系统工程课设计-唐健涛530

现代装修材料论文

关于煤矿装备安全监控系统项目公开征求意见公告

第03讲 货币的时间价值与利率

国际危机组织

团队建设培训PPT（PPT36页)

某公司采购部原材料采购岗位说明书

相关文档

相关搜索

第03讲货币的时间价值与利率