您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据挖掘与识别 > Apriori关联分析
AprioriAlgorithmHongjieWuAbstractPart1AprioriAlgorithmPart2AdvantagesDisadvantagesPart3InstancePart4目录AprioriisaseminalalgorithmproposedbyR.AgrawalandR.Srikantin1994forminingfrequentitemsetsforBooleanassociationrules.Thenameofthealgorithmisbasedonthefactthatthealgorithmusespriorknowledgeoffrequentitemsetproperties,asweshallseefollowing.Aprioriemploysaniterativeapproachknownasalevel-wisesearch,wherek-itemsetsareusedtoexplore(k+1)-itemsets.First,thesetoffrequent1-itemsetsisfoundbyscanningthedatabasetoaccumulatethecountforeachitem,andcollectingthoseitemsthatsatisfyminimumsupport.TheresultingsetisdenotedL1.Next,L1isusedtofindL2,thesetoffrequent2-itemsets,whichisusedtofindL3,andsoon,untilnomorefrequentk-itemsetscanbefound.ThefindingofeachLkrequiresonefullscanofthedatabase.Aprioriproperty:Allnonemptysubsetsofafrequentitemsetmustalsobefrequent.Rulesthatsatisfybothaminimumsupportthreshold(minsup)andaminimumconfidencethreshold(minconf)arecalledstrong.1.Thejoinstep:TofindLk,asetofcandidatek-itemsetsisgeneratedbyjoiningLk-1withitself.ThissetofcandidatesisdenotedCk.Letl1andl2beitemsetsinLk-1.Thenotationli[j]referstothejthiteminli(e.g.,l1[k-2]referstothesecondtothelastiteminl1).Byconvention,Aprioriassumesthatitemswithinatransactionoritemsetaresortedinlexicographicorder.Forthe(k-1)-itemset,li,thismeansthattheitemsaresortedsuchthatli[1]li[2]...li[k-1].Thejoin,Lk-1onLk-1,isperformed,wheremembersofLk-1arejoinableiftheirfirst(k-2)itemsareincommon.Thatis,membersl1andl2ofLk-1arejoinedif(l1[1]=l2[1])^(l1[2]=l2[2])^...^(l1[k-2]=l2[k-2])^(l1[k-1]l2[k-1]).Theconditionl1[k-1]l2[k-1]simplyensuresthatnoduplicatesaregenerated.Theresultingitemsetformedbyjoiningl1andl2isl1[1],l1[2],...,l1[k-2],l1[k-1],l2[k-1].2.Theprunestep:CkisasupersetofLk,thatis,itsmembersmayormaynotbefrequent,butallofthefrequentk-itemsetsareincludedinCk.AscanofthedatabasetodeterminethecountofeachcandidateinCkwouldresultinthedeterminationofLk(i.e.,allcandidateshavingacountnolessthantheminimumsupportcountarefrequentbydefinition,andthereforebelongtoLk).Ck,however,canbehuge,andsothiscouldinvolveheavycomputation.ToreducethesizeofCk,theAprioripropertyisusedasfollows.Any(k-1)-itemsetthatisnotfrequentcannotbeasubsetofafrequentk-itemset.Hence,ifany(k-1)-subsetofacandidatek-itemsetisnotinLk-1,thenthecandidatecannotbefrequenteitherandsocanberemovedfromCk.Thissubsettestingcanbedonequicklybymaintainingahashtreeofallfrequentitemsets.Example5.3Apriori.Let’slookataconcreteexample,basedontheAllElectronicstransactiondatabase,D,ofTable5.1.Thereareninetransactionsinthisdatabase,thatis,jDj=9.WeuseFigure5.2toillustratetheApriorialgorithmforfindingfrequentitemsetsinD.1.Inthefirstiterationofthealgorithm,eachitemisamemberofthesetofcandidate1-itemsets,C1.Thealgorithmsimplyscansallofthetransactionsinordertocountthenumberofoccurrencesofeachitem.2.Supposethattheminimumsupportcountrequiredis2,thatis,minsup=2.(Here,wearereferringtoabsolutesupportbecauseweareusingasupportcount.Thecorrespondingrelativesupportis2/9=22%).Thesetoffrequent1-itemsets,L1,canthenbedetermined.Itconsistsofthecandidate1-itemsetssatisfyingminimumsupport.Inourexample,allofthecandidatesinC1satisfyminimumsupport.3.Todiscoverthesetoffrequent2-itemsets,L2,thealgorithmusesthejoinL1onL1togenerateacandidatesetof2-itemsets,C2.8C2consistsof2-itemsets.NotethatnocandidatesareremovedfromC2duringtheprunestepbecauseeachsubsetofthecandidatesisalsofrequent.4.Next,thetransactionsinDarescannedandthesupportcountofeachcandidateitemsetinC2isaccumulated,asshowninthemiddletableofthesecondrowinFigure5.2.5.Thesetoffrequent2-itemsets,L2,isthendetermined,consistingofthosecandidate2-itemsetsinC2havingminimumsupport.6.Thegenerationofthesetofcandidate3-itemsets,C3,isdetailedinFigure5.3.Fromthejoinstep,wefirstgetC3=L2onL2=ffI1,I2,I3g,fI1,I2,I5g,fI1,I3,I5g,fI2,I3,I4g,fI2,I3,I5g,fI2,I4,I5gg.BasedontheAprioripropertythatallsubsetsofafrequentitemsetmustalsobefrequent,wecandeterminethatthefourlattercandidatescannotpossiblybefrequent.WethereforeremovethemfromC3,therebysavingtheeffortofunnecessarilyobtainingtheircountsduringthesubsequentscanofDtodetermineL3.Notethatwhengivenacandidatek-itemset,weonlyneedtocheckifits(k-1)-subsetsarefrequentsincetheApriorialgorithmusesalevel-wisesearchstrategy.TheresultingprunedversionofC3isshowninthefirsttableofthebottomrowofFigure5.2.7.ThetransactionsinDarescannedinordertodetermineL3,consistingofthosecandidate3-itemsetsinC3havingminimumsupport(Figure5.2).8.ThealgorithmusesL3onL3togenerateacandidatesetof4-itemsets,C4.AlthoughthejoinresultsinffI1,I2,I3,I5gg,thisitemsetisprunedbecauseitssubsetffI2,I3,I5ggisnotfrequent.Thus,C4=f,andthealgorithmterminates,havingfoundallofthefrequentitemsets.Advantagesa.Greatlycompressthesizeofthefrequentepisodesb.AchievedgoodperformanceDisadvantagesa.Producealargenumberoffrequentsetsb.RepeatscanthetransactiondatabaseR.Agrawal•AprioriTid•AprioriALLDHPAlgorithmFP-growthAlgorithmOPAlgorithmTreeProjectionAlgorithm......AssociationRulesDiscoveringModelBasedonGeneticAlgorithmandtheimprovedApr
本文标题:Apriori关联分析
链接地址:https://www.777doc.com/doc-3813505 .html