您好,欢迎访问三七文档
MachLearn(2006)62:33–63DOI10.1007/s10994-006-5834-0Propositionalization-basedrelationalsubgroupdiscoverywithRSDFilipˇZelezn´y·NadaLavraˇcReceived:24February2003/Revised:1December2004/Accepted:27July2005/Publishedonline:27January2006CSpringerScience+BusinessMedia,Inc.2006AbstractRelationalrulelearningalgorithmsaretypicallydesignedtoconstructclassifi-cationandpredictionrules.However,relationalrulelearningcanbeadaptedalsotosub-groupdiscovery.Thispaperproposesapropositionalizationapproachtorelationalsubgroupdiscovery,achievedthroughappropriatelyadaptingrulelearningandfirst-orderfeatureconstruction.TheproposedapproachwassuccessfullyappliedtostandardILPproblems(East-Westtrains,King-Rook-Kingchessendgameandmutagenicityprediction)andtworeal-lifeproblems(analysisoftelephonecallsandtrafficaccidentanalysis).KeywordsRelationaldatamining.Propositionalization.Featureconstruction.Subgroupdiscovery1.IntroductionClassicalrulelearningalgorithmsaredesignedtoconstructclassificationandpredictionrules(Michieetal.,1994;Clark&Niblett,1989;Cohen,1995).Thegoalofthesepredictiveinductionalgorithmsistoinduceclassification/predictionmodelsconsistingofasetofrules.Ontheotherhand,opposedtomodelinduction,descriptiveinductionalgorithms(DeRaedt&Dehaspe,1997;Wrobel&Dˇzeroski,1995)aimtodiscoverpatternsdescribedintheformofindividualrules.Descriptiveinductionalgorithmsincludeassociationrulelearners(e.g.,APRIORI(Agrawaletal.,1996)),clausaldiscoverysystems(e.g.,CLAUDIEN(DeRaedt&Dehaspe,1997;DeRaedtetal.,2001)),andsubgroupdiscoverysystems(e.g.,MIDOSEditors:HendrikBlockeel,DavidJensenandStefanKramerF.ˇZelezn´y()CzechTechnicalUniversity,Prague,CzechRepublice-mail:zelezny@fel.cvut.czN.LavraˇcInstituteJoˇzefStefan,Ljubljana,Slovenia,andNovaGoricaPolytechnic,NovaGorica,Sloveniae-mail:nada.lavrac@ijs.siSpringer34MachLearn(2006)62:33–63(Wrobel,1997;Wrobel,2001),EXPLORA(Kloesgen,1996)andSubgroupMiner(Kloesgen&May,2002)).Thispaperinvestigatesrelationalsubgroupdiscovery.AsintheMIDOSrelationalsub-groupdiscoverysystem,asubgroupdiscoverytaskisdefinedasfollows:Givenapopulationofindividualsandapropertyofindividualsweareinterestedin,findpopulationsubgroupsthatarestatistically‘mostinteresting’,e.g.,areaslargeaspossibleandhavethemostunusualstatistical(distributional)characteristicswithrespecttothepropertyofinterest.Noticeanimportantaspectoftheabovedefinition:thereisapredefinedpropertyofinterest,meaningthatasubgroupdiscoverytaskaimsatcharacterizingpopulationsubgroupsofagiventargetclass.Thispropertyindicatesthatstandardclassificationrulelearningalgorithmscouldbeusedforsolvingthetask.However,whilethegoalofclassificationrulelearningistogeneratemodels(setsofrules),inducingclassdescriptionsintermsofpropertiesoccurringinthedescriptionsoftrainingexamples,incontrast,subgroupdiscoveryaimsatdiscoveringindividualpatternsofinterest(individualrulesdescribingthetargetclass).Thispaperproposestoadaptclassificationrulelearningtorelationalsubgroupdiscov-ery,basedonprinciplesthatemploythefollowingmainingredients:propositionalizationthroughfirst-orderfeatureconstruction,featurefiltering,incorporationofexampleweightsintotheweightedrelativeaccuracysearchheuristic,andimplementationoftheweightedcoveringalgorithm.Mostoftheabove-listedelementsconformtothesubgroupdiscov-erymethodologyproposedbyLavraˇcetal.(2004);forcompleteness,theseelementsaredescribedinSection3.Themaincontributionsofthispaperconcernthetransferofthismethodologytothemulti-relationallearningsetting.Thecontributionsincludesubstan-tialimprovementsofthepropositionalizationstep(comparedtothepropositionalizationproposedbyFlachandLachiche(1999)andLavraˇcandFlach(2001))andaneffectiveimplementationofrelationalsubgroupdiscoveryalgorithmRSD,employinglanguageandevaluationconstraints.FurthercontributionsconcerntheanalysisoftheRSDsubgroupdis-coveryalgorithmintheROCspace,andthesuccessfulapplicationofRSDtostandardILPproblems(East-Westtrains,King-Rook-Kingchessendgameandmutagenicitypredic-tion)andtworeal-lifeproblemdomains(analysisoftelephonecallsandanalysisoftrafficaccidents).RSDisavailableat∼zelezny/rsd/.ThiswebpagegivesaccesstotheRSDsystem,theuser’smanual,thedatasets(Trains,KRK,Mutagenesis,Telecom)1andtherelatedparametersettingdeclarations,whichenablethereproductionoftheexperimentalresultsofthispaper.Thepaperisorganizedasfollows.Section2specifiestherelationalsubgroupdiscoverytask,illustratingfirst-orderfeatureconstructionandresultsofruleinductiononthewell-knownEast-Westchallengelearningproblem.Italsodefinescriteriaforevaluatingtheresultsofsubgroupdiscoveryalgorithms.InSection3,thebackgroundofthisworkisexplained,includingpointerstotherelatedwork.Sections4and5presentthemainingredientsoftheRSDsubgroupdiscoveryalgorithm:propositionalizationthroughefficientfirst-orderfeatureconstructionandconstraint-basedinductionofsubgroupdescriptions,respectively.Section6describestheexperimentaldomains.TheresultsofexperimentsarepresentedinSections7and8.Section9concludesbysummarizingtheresultsandpresentingplansforfurtherwork.1TheTrafficdatasetisnotavailabl
本文标题:Propositionalization-based relational subgroup dis
链接地址:https://www.777doc.com/doc-3274636 .html