您好,欢迎访问三七文档
GESCONDA:AToolforKnowledgeDiscoveryandDataMininginEnvironmentalDatabasesMiquelSànchez-Marrè1,KarinaGibert1,2andIgnasiRodríguez-Roda31KnowldegeEngineeringandMachineLearninggroup(KEMLG),UniversitatPolitècnicadeCatalunya,CampusNord-EdificiC5,JordiGirona1-3,08034Barcelona,Catalonia,EUmiquel@lsi.upc.es2Dep.StatisticsandOperationResearch,UniversitatPolitècnicadeCatalunya,PauGargallo5.08028Barcelona,Catalonia,EUkarina@eio.upc.es3Laboratorid’EnginyeriaQuímicaiAmbiental(LEQUIA),UniversitatdeGirona,CampusdeMontilivis/n,17071Girona,Catalonia,EUignasi@lequia.udg.esAbstract.Inthispaper,atoolforKnowledgeDiscoveryandDataMininginenvironmentaldatabasesispresented.Inthelongterm,themaingoalofthisresearchistodesignanddevelopatool,namedGESCONDA,forintelligentdataanalysisandmanagementofimplicitknowledgefromdatabases;itwillprovidesupporttoKnowledgeDiscoveryandDataMiningtasksthatcanguidethedecision-makingprocess,withspecialfocusonenvironmentaldatabases.Thefirststageoftheprojectistodevelopaprototypeofthetool.Differingfromtheexistingcommercialsystems,themorerelevantaspectsofthisproposalarethepossibilityofinteractionbetweenthedevelopedmethods,thedevelopmentofmixedtechniques(combiningtoolsfromdifferentfields,asAIorStatistics,thatcooperateamongthem)toextracttheknowledgecontainedindata,theexistenceofdynamicaldataanalysis,andtheexistenceofarecommenderagent,whichwillsuggestthebestmethodtobeuseddependingonthetargetdomainandonthegoalsspecifiedbyusers.Thepurposeofthepaperistopresentthearchitectureofthesystemaswellasitsfunctionalityandtoillustratesomeofthepossibilitiesofsupportingknowledgediscoveryanddataminingonenvironmentalrealdomains.Finally,theuseofGESCONDAinthecontextofenvironmentalsystemsispresented,aswellasresultsobtainedinaconcretecasestudy.1IntroductionAnIntelligentEnvironmentalDecisionSupportSystem(IEDSS)canbedefinedasanintelligentinformationsystemfordecreasingthedecision-makingtimeandimprovingconsistencyandqualityofdecisionsinEnvironmentalSystems.AnIEDSSisanidealdecision-orientedtoolforsuggestingrecommendationsinanenvironmentaldomain.ThemainoutstandingfeatureofIEDSSistheknowledgeembodied,whichprovidesthesystemwithenhancedabilitiestoreasonabouttheenvironmentalsysteminamorereliableway.Acommonproblemintheirdevelopmentishowtoobtainthatknowledge.Classicapproachesarebasedonobtainingtheknowledgewithmanualinteractivesessionsbetweenenvironmentalexpertsandknowledgeengineers.However,whendatabasessummarisingthehistoricalbehaviouroftheenvironmentalsystemareavailable,amoreinterestingandpromisingapproachissuitable:thatofusingseveralwellknownautomatedtechniquesfrombothStatisticsandMachineLearningfieldstogettheknowledge.TheuseofallthosetechniquestogetherforanalysingdataareusuallynamedasDataMiningorKnowledgeDiscoverytechnologies.Itisremarkablethehighquantityofinformationandimplicitknowledgepatternsinlargedatabasescomingfromthemonitoringofanysystemordynamicalenvironmentalprocess.Forinstance,historicaldatacollectedaboutmeteorologicalphenomenainacertainarea,orabouttheperformanceofawastewatertreatmentplant,oraboutcharacterizingenvironmentalemergencies(toxicsubstanceswasting,inflammablegasexpansion),oraboutgeomorphologycaldescriptionofseismicactivity.Allthisinformationandknowledgeisveryimportantforpredictiontasks,control,supervisionandminimisationofenvironmentalimpacteitherinNatureandHumanbeingsthemselves.ThisresearchisinvolvedwithbuildinganIntelligentDataAnalysisSystem(IDAS)toprovidethesupporttothesekindofenvironmentalsystems.Thistoolisbasicallycomposedbyseveralstatisticaldataanalysismethods,suchasone-wayandtwo-waydescriptivestatistics,missingdataanalysis,clustering,ormodellingrelationshipsbetweenvariables.Also,severalmachinelearningtechniques,comingfromArtificialIntelligence,areintegratedinthesametool,suchasclustering,classification,ruleinduction,decisiontreeinduction,case-basedreasoningtechniques,supportvectormachines,anddynamicalanalysis.Thesystemisalsoprovidedwithahigherlevelcomponent,whichallowsinformationexchangebetweenonemethodandanotherandalsowitharecommenderthat,giventheproblem,cansuggestthemoreappropriateanalysis.Intheliterature,therearesomerelatedworksandfinancedresearchprojectsontheactivemachinelearningfieldinenvironmentaldatamining,suchasin[19],[6],[9]and[2].Also,someEuropeanresearchcentresareinvolvedintheprojectEDAM(INTAS99-00099)[17],aresearchprojectonenvironmentaldatamining,learningalgorithmsandstatisticaltoolsformonitoringandforecasting.However,authorsarenotawareoftheexistenceofaspecificsoftwareforknowledgediscoveryanddataminingofenvironmentaldatabases,takingintoaccountthespecialfeaturesofenvironmentaldomains,suchasthetemporalanddynamiccomponentofdata,ortheproblemofnoisydata,anddatafilteringwithnoclearrelevantorirrelevantfeatures.Infact,thesearemajordifferenceswithothercommercialorfreewaresoftware,suchasWEKA.TheissueofourworkaimsatdesigningandbuildinganIntelligentKnowledgeDataDiscoveryandDataMiningSystem,especiallysuitableforenvironmentaldataanalysis.Thesoftwaretool,whichiscalledGESCONDA[14],[16],[26],is,atpresent,partiallybuilt-up,andi
本文标题:GESCONDA A Tool for Knowledge Discovery and Data M
链接地址:https://www.777doc.com/doc-3363210 .html