您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据挖掘与识别 > 机器学习十大算法:Apriori
Chapter4AprioriHiroshiMotodaandKouzouOharaContents4.1Introduction............................................................624.2AlgorithmDescription..................................................624.2.1MiningFrequentPatternsandAssociationRules..................624.2.1.1Apriori.................................................634.2.1.2AprioriTid..............................................664.2.2MiningSequentialPatterns.......................................674.2.2.1AprioriAll..............................................684.2.3Discussion.......................................................694.3DiscussiononAvailableSoftwareImplementations......................704.4TwoIllustrativeExamples...............................................714.4.1WorkingExamples..............................................714.4.1.1FrequentItemsetandAssociationRuleMining..........714.4.1.2SequentialPatternMining...............................754.4.2PerformanceEvaluation..........................................764.5AdvancedTopics........................................................804.5.1ImprovementinApriori-TypeFrequentPatternMining...........804.5.2FrequentPatternMiningWithoutCandidateGeneration...........814.5.3IncrementalApproach...........................................824.5.4CondensedRepresentation:ClosedPatternsandMaximalPatterns............................................824.5.5QuantitativeAssociationRules...................................844.5.6UsingOtherMeasureofImportance/Interestingness..............844.5.7ClassAssociationRules..........................................854.5.8UsingRicherExpression:Sequences,Trees,andGraphs..........864.6Summary...............................................................874.7Exercises...............................................................88References...................................................................8961©2009byTaylor&FrancisGroup,LLC62Apriori4.1IntroductionManyofthepatternfindingalgorithmssuchasthosefordecisiontreebuilding,clas-sificationruleinduction,anddataclusteringthatarefrequentlyusedindatamininghavebeendevelopedinthemachinelearningresearchcommunity.Frequentpatternandassociationruleminingisoneofthefewexceptionstothistradition.Itsintroduc-tionboosteddataminingresearchanditsimpactistremendous.Thebasicalgorithmsaresimpleandeasytoimplement.Inthischapterthemostfundamentalalgorithmsoffrequentpatternandassociationrulemining,knownasAprioriandAprioriTid[3,4],andApriori’sextensiontosequentialpatternmining,knownasAprioriAll[6,5],areexplainedbasedontheoriginalpaperswithworkingexamples,andperformanceanalysisofAprioriisshownusingafreelyavailableimplementation[1]foradatasetinUCIrepository[8].SinceAprioriissofundamentalandtheformofdatabaseislimitedtomarkettransaction,therehavebeenmanyworksforimprovingcompu-tationalefficiency,findingmorecompactrepresentation,andextendingthetypesofdatathatcanbehandled.Someoftheimportantworksarealsobrieflydescribedasadvancedtopics.4.2AlgorithmDescription4.2.1MiningFrequentPatternsandAssociationRulesOneofthemostpopulardataminingapproachesistofindfrequentitemsetsfromatransactiondatasetandderiveassociationrules.Theproblemisformallystatedasfollows.LetI={i1,i2,...,im}beasetofitems.LetDbeasetoftransactions,whereeachtransactiontisasetofitemssuchthatt⊆I.Eachtransactionhasauniqueidentifier,calleditsTID.AtransactiontcontainsX,asetofsomeitemsinI,ifX⊆t.AnassociationruleisanimplicationoftheformX⇒Y,whereX⊂I,Y⊂I,andX∩Y=∅.TheruleX⇒YholdsinDwithconfidencec(0≤c≤1)ifthefractionoftransactionsthatalsocontainYinthosewhichcontainXinDisc.TheruleX⇒Y(andequivalentlyX∪Y)hassupport1s(0≤s≤1)inDifthefractionoftransactionsinDthatcontainX∪Yiss.GivenasetoftransactionsD,theproblemofminingassociationrulesistogenerateallassociationrulesthathavesupportandconfidencenolessthantheuser-specifiedminimumsupport(calledminsup)andminimumconfidence(calledminconf),respectively.Findingfrequent2itemsets(itemsetswithsupportnolessthanminsup)isnottri-vialbecauseofthecomputationalcomplexityduetocombinatorialexplosion.Once1Analternativesupportdefinitionistheabsolutecountoffrequency.Inthischapterthelatterdefinitionisalsousedwhereappropriate.2TheAprioripaper[3]uses“large”tomean“frequent,”butlargeisoftenassociatedwiththenumberofitemsintheitemset.Thus,weprefertouse“frequent.”©2009byTaylor&FrancisGroup,LLC4.2AlgorithmDescription63frequentitemsetsareobtained,itisstraightforwardtogenerateassociationruleswithconfidencenolessthanminconf.AprioriandAprioriTid,proposedbyR.AgrawalandR.Srikant,areseminalalgorithmsthataredesignedtoworkforalargetransactiondataset[3].4.2.1.1AprioriAprioriisanalgorithmtofindallsetsofitems(itemsets)thathavesupportnolessthanminsup.Thesupportforanitemsetistheratioofthenumberoftransactionsthatcontaintheitemsettothetotalnumberoftransactions.Itemsetsthatsatisfyminimumsupportconstraintarecalledfrequentitemsets.Aprioriischaracterizedasalevel-wisecompletesearch(breadthfirstsearch)algorithmusinganti-monotonicitypropertyofitemsets:“Ifanitemsetisnotfrequent,anyofitssupersetisneverfrequent,”whichi
本文标题:机器学习十大算法:Apriori
链接地址:https://www.777doc.com/doc-4208123 .html