您好,欢迎访问三七文档
ASurveyofEvolutionaryAlgorithmsforDataMiningandKnowledgeDiscoveryAlexA.FreitasPostgraduatePrograminComputerScience,PontificiaUniversidadeCatolicadoParanaRuaImaculadaConceicao,1155.Curitiba-PR.80215-901.Brazil.E-mail:alex@ppgia.pucpr.brWebpage:~alexAbstract:Thischapterdiscussestheuseofevolutionaryalgorithms,particularlygeneticalgorithmsandgeneticprogramming,indataminingandknowledgediscovery.Wefocusonthedataminingtaskofclassification.Inaddition,wediscusssomepreprocessingandpostprocessingstepsoftheknowledgediscoveryprocess,focusingonattributeselectionandpruningofanensembleofclassifiers.Weshowhowtherequirementsofdataminingandknowledgediscoveryinfluencethedesignofevolutionaryalgorithms.Inparticular,wediscusshowindividualrepresentation,geneticoperatorsandfitnessfunctionshavetobeadaptedforextractinghigh-levelknowledgefromdata.1.IntroductionTheamountofdatastoredindatabasescontinuestogrowfast.Intuitively,thislargeamountofstoreddatacontainsvaluablehiddenknowledge,whichcouldbeusedtoimprovethedecision-makingprocessofanorganization.Forinstance,dataaboutprevioussalesmightcontaininterestingrelationshipsbetweenproductsandcustomers.Thediscoveryofsuchrelationshipscanbeveryusefultoincreasethesalesofacompany.However,thenumberofhumandataanalystsgrowsatamuchsmallerratethantheamountofstoreddata.Thus,thereisaclearneedfor(semi-)automaticmethodsforextractingknowledgefromdata.Thisneedhasledtotheemergenceofafieldcalleddataminingandknowledgediscovery[66].Thisisaninterdisciplinaryfield,usingmethodsofseveralresearchareas(speciallymachinelearningandstatistics)toextracthigh-levelknowledgefromreal-worlddatasets.Dataminingisthecorestepofabroaderprocess,calledknowledgediscoveryindatabases,orknowledgediscovery,forshort.Thisprocessincludestheapplicationofseveralpreprocessingmethodsaimedatfacilitatingtheapplicationofthedataminingalgorithmandpostprocessingmethodsaimedatrefiningandimprovingthediscoveredknowledge.Thischapterdiscussestheuseofevolutionaryalgorithms(EAs),particularlygeneticalgorithms(GAs)[29],[47]andgeneticprogramming(GP)[41],[6],indataminingandknowledgediscovery.Wefocusonthedataminingtaskofclassification,whichisthetaskaddressedbymostEAsthatextracthigh-levelknowledgefromdata.Inaddition,wediscusstheuseofEAsforperformingsomepreprocessingandpostprocessingstepsoftheknowledgediscoveryprocess,focusingonattributeselectionandpruningofanensembleofclassifiers.WeshowhowtherequirementsofdataminingandknowledgediscoveryinfluencethedesignofEAs.Inparticular,wediscusshowindividualrepresentation,geneticoperatorsandfitnessfunctionshavetobeadaptedforextractinghigh-levelknowledgefromdata.Thischapterisorganizedasfollows.Section2presentsanoverviewofdataminingandknowledgediscovery.Section3discussesseveralaspectsofthedesignofGAsforrulediscovery.Section4discussesGAsforperformingsomepreprocessingandpostprocessingstepsoftheknowledgediscoveryprocess.Section5addressestheuseofGPinrulediscovery.Section6addressestheuseofGPinthepreprocessingphaseoftheknowledgediscoveryprocess.Finally,section7presentsadiscussionthatconcludesthechapter.2.AnOverviewofDataMiningandKnowledgeDiscoveryThissectionisdividedintothreeparts.Subsection2.1discussesthedesirablepropertiesofdiscoveredknowledge.Subsection2.2reviewsthemaindataminingtasks.Subsection2.3presentsanoverviewoftheknowledgediscoveryprocess.2.1TheDesirablePropertiesofDiscoveredKnowledgeInessence,dataminingconsistsofthe(semi-)automaticextractionofknowledgefromdata.Thisstatementraisesthequestionofwhatkindofknowledgeweshouldtrytodiscover.Althoughthisisasubjectiveissue,wecanmentionthreegeneralpropertiesthatthediscoveredknowledgeshouldsatisfy;namely,itshouldbeaccurate,comprehensible,andinteresting.Letusbrieflydiscusseachofthesepropertiesinturn.(Seealsosection3.3.)Aswillbeseeninthenextsubsection,indataminingweareofteninterestedindiscoveringknowledgewhichhasacertainpredictivepower.Thebasicideaistopredictthevaluethatsomeattribute(s)willtakeonin“thefuture”,basedonpreviouslyobserveddata.Inthiscontext,wewantthediscoveredknowledgetohaveahighpredictiveaccuracyrate.Wealsowantthediscoveredknowledgetobecomprehensiblefortheuser.Thisisnecessarywheneverthediscoveredknowledgeistobeusedforsupportingadecisiontobemadebyahumanbeing.Ifthediscovered“knowledge”isjustablackbox,whichmakespredictionswithoutexplainingthem,theusermaynottrustit[48].Knowledgecomprehensibilitycanbeachievedbyusinghigh-levelknowledgerepresentations.Apopularone,inthecontextofdatamining,isasetofIF-THEN(prediction)rules,whereeachruleisoftheform:IFsome_conditions_are_satisfiedTHENpredict_some_value_for_an_attributeThethirdproperty,knowledgeinterestingness,isthemostdifficultonetodefineandquantify,sinceitis,toalargeextent,subjective.However,therearesomeaspectsofknowledgeinterestingnessthatcanbedefinedinobjectiveterms.Thetopicofruleinterestingness,includingacomparisonbetweenthesubjectiveandtheobjectiveapproachesformeasuringruleinterestingness,willbediscussedinsection2.3.2.2.2DataMiningTasksInthissectionwebrieflyreviewsomeofthemaindataminingtasks.Eachtaskcanbe
本文标题:A survey of evolutionary algorithms for data minin
链接地址:https://www.777doc.com/doc-3785809 .html