A survey of evolutionary algorithms for data minin

ASurveyofEvolutionaryAlgorithmsforDataMiningandKnowledgeDiscoveryAlexA.FreitasPostgraduatePrograminComputerScience,PontificiaUniversidadeCatolicadoParanaRuaImaculadaConceicao,1155.Curitiba-PR.80215-901.Brazil.E-mail:alex@ppgia.pucpr.brWebpage:~alexAbstract:Thischapterdiscussestheuseofevolutionaryalgorithms,particularlygeneticalgorithmsandgeneticprogramming,indataminingandknowledgediscovery.Wefocusonthedataminingtaskofclassification.Inaddition,wediscusssomepreprocessingandpostprocessingstepsoftheknowledgediscoveryprocess,focusingonattributeselectionandpruningofanensembleofclassifiers.Weshowhowtherequirementsofdataminingandknowledgediscoveryinfluencethedesignofevolutionaryalgorithms.Inparticular,wediscusshowindividualrepresentation,geneticoperatorsandfitnessfunctionshavetobeadaptedforextractinghigh-levelknowledgefromdata.1.IntroductionTheamountofdatastoredindatabasescontinuestogrowfast.Intuitively,thislargeamountofstoreddatacontainsvaluablehiddenknowledge,whichcouldbeusedtoimprovethedecision-makingprocessofanorganization.Forinstance,dataaboutprevioussalesmightcontaininterestingrelationshipsbetweenproductsandcustomers.Thediscoveryofsuchrelationshipscanbeveryusefultoincreasethesalesofacompany.However,thenumberofhumandataanalystsgrowsatamuchsmallerratethantheamountofstoreddata.Thus,thereisaclearneedfor(semi-)automaticmethodsforextractingknowledgefromdata.Thisneedhasledtotheemergenceofafieldcalleddataminingandknowledgediscovery[66].Thisisaninterdisciplinaryfield,usingmethodsofseveralresearchareas(speciallymachinelearningandstatistics)toextracthigh-levelknowledgefromreal-worlddatasets.Dataminingisthecorestepofabroaderprocess,calledknowledgediscoveryindatabases,orknowledgediscovery,forshort.Thisprocessincludestheapplicationofseveralpreprocessingmethodsaimedatfacilitatingtheapplicationofthedataminingalgorithmandpostprocessingmethodsaimedatrefiningandimprovingthediscoveredknowledge.Thischapterdiscussestheuseofevolutionaryalgorithms(EAs),particularlygeneticalgorithms(GAs)[29],[47]andgeneticprogramming(GP)[41],[6],indataminingandknowledgediscovery.Wefocusonthedataminingtaskofclassification,whichisthetaskaddressedbymostEAsthatextracthigh-levelknowledgefromdata.Inaddition,wediscusstheuseofEAsforperformingsomepreprocessingandpostprocessingstepsoftheknowledgediscoveryprocess,focusingonattributeselectionandpruningofanensembleofclassifiers.WeshowhowtherequirementsofdataminingandknowledgediscoveryinfluencethedesignofEAs.Inparticular,wediscusshowindividualrepresentation,geneticoperatorsandfitnessfunctionshavetobeadaptedforextractinghigh-levelknowledgefromdata.Thischapterisorganizedasfollows.Section2presentsanoverviewofdataminingandknowledgediscovery.Section3discussesseveralaspectsofthedesignofGAsforrulediscovery.Section4discussesGAsforperformingsomepreprocessingandpostprocessingstepsoftheknowledgediscoveryprocess.Section5addressestheuseofGPinrulediscovery.Section6addressestheuseofGPinthepreprocessingphaseoftheknowledgediscoveryprocess.Finally,section7presentsadiscussionthatconcludesthechapter.2.AnOverviewofDataMiningandKnowledgeDiscoveryThissectionisdividedintothreeparts.Subsection2.1discussesthedesirablepropertiesofdiscoveredknowledge.Subsection2.2reviewsthemaindataminingtasks.Subsection2.3presentsanoverviewoftheknowledgediscoveryprocess.2.1TheDesirablePropertiesofDiscoveredKnowledgeInessence,dataminingconsistsofthe(semi-)automaticextractionofknowledgefromdata.Thisstatementraisesthequestionofwhatkindofknowledgeweshouldtrytodiscover.Althoughthisisasubjectiveissue,wecanmentionthreegeneralpropertiesthatthediscoveredknowledgeshouldsatisfy;namely,itshouldbeaccurate,comprehensible,andinteresting.Letusbrieflydiscusseachofthesepropertiesinturn.(Seealsosection3.3.)Aswillbeseeninthenextsubsection,indataminingweareofteninterestedindiscoveringknowledgewhichhasacertainpredictivepower.Thebasicideaistopredictthevaluethatsomeattribute(s)willtakeonin“thefuture”,basedonpreviouslyobserveddata.Inthiscontext,wewantthediscoveredknowledgetohaveahighpredictiveaccuracyrate.Wealsowantthediscoveredknowledgetobecomprehensiblefortheuser.Thisisnecessarywheneverthediscoveredknowledgeistobeusedforsupportingadecisiontobemadebyahumanbeing.Ifthediscovered“knowledge”isjustablackbox,whichmakespredictionswithoutexplainingthem,theusermaynottrustit[48].Knowledgecomprehensibilitycanbeachievedbyusinghigh-levelknowledgerepresentations.Apopularone,inthecontextofdatamining,isasetofIF-THEN(prediction)rules,whereeachruleisoftheform:IFsome_conditions_are_satisfiedTHENpredict_some_value_for_an_attributeThethirdproperty,knowledgeinterestingness,isthemostdifficultonetodefineandquantify,sinceitis,toalargeextent,subjective.However,therearesomeaspectsofknowledgeinterestingnessthatcanbedefinedinobjectiveterms.Thetopicofruleinterestingness,includingacomparisonbetweenthesubjectiveandtheobjectiveapproachesformeasuringruleinterestingness,willbediscussedinsection2.3.2.2.2DataMiningTasksInthissectionwebrieflyreviewsomeofthemaindataminingtasks.Eachtaskcanbe

A survey of evolutionary algorithms for data minin

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

2自动化生产线(李)（PPT31页)

电子备课表格

邵亦波博客电子版

校园物业管理方案

高架桥扩大基础施工方案

4-1数控铣床、加工中心上的零件装夹

特鲁多沃也矿产地地质资料

【北京华膳园温泉饭店搭建智通多媒体信息发布系统】

采购一般条款

四、生产费用在完工产品和在产品间分配

相关文档

相关搜索