您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 经营企划 > 决策树剪枝方法的比较_魏红宁
:2004-03-12:(1966-),,.:0258-2724(2005)01-0044-05魏红宁(,610031):,4.PEP,MEP;REP,;,CCPREP.,REP,,PEP.:;;;PEP;MEP;REP;CCP:TP311:AComparisonamongMethodsofDecisionTreePruningWEIHong-ning(AdministrativeOffice,SouthwestJiaotongUniversity,Chengdu610031,China)Abstract:Toselectasuitablepruningmethodindecisiontreepruning,fourwel-lknownpruningmethodswerecomparedintermsofcomputationalcomplexity,traversalstrategy,errorestimationandtheoreticalprinciplebytakingaclassificationandregressiontreeasanexample.Comparedwithpessimisticerrorpruning(PEP),minimumerrorpruning(MEP)islessaccurateandproducesalargertree.Reducederrorpruning(REP)isoneofthesimplestpruningstrategies,butithasthedisadvantageofrequiringaseparatedatasetforpruning.Cost-complexitypruning(CCP)producesasmallertreethanREPwithsimilaraccuracy.Practically,ifthetrainingdataisabundant,REPispreferable;andifthetraindataistheexpectedaccuracyishighbutwithlimiteddata,PEPisgoodchoice.Keywords:datamining;decisiontree;postpruning;pessimisticerrorpruning;minimumerrorpruning;reducederrorpruning;cost-complexitypruning..,.,,,,.,,[1].,.,[2],.,;.,,.(1)CART(classificationandregressiontrees)[3],,.1ti40120052JOURNALOFSOUTHWESTJIAOTONGUNIVERSITYVol.40No.1Feb.20051Fig.1Exampleofdecisiontreei,A,B,A,B.1CCP(cost-complexitypruning)CCP[3]:(1)T0T0,T1,,Tn.,Ti+1Ti,Tn.(2)1,.1,{T0,T1,,Tn}T0,TiTi+1.,1Tt,R(t)-R(Tt),,R(t)tt,R(Tt)tTt.,,TL(Tt)-1,,L(Tt)Tt,,T.,,:=R(t)-R(Tt)L(Tt)-1.(1)Ti+1Ti.11.1Tab.1CalculatedofonthedecisiontreeinFig.1T0(t4)=0.0125(t5)=0.0500(t2)=0.0292(t3)=0.0375T1(t5)=0.0500(t2)=0.0375(t3)=0.0375T2(t3)=0.03751,T0,4t4,,T0t4T1;T1,t2t3,t2,,T2T1t2.1T0,T1,T2,1CCP2.,V(V-foldcross-validation),.[3].[4],T(),,CCP.,.CCPV,DVD1,D2,,DV.,D-Di(i=1,2,,V)VT1,T2,,TV.T(i)T1(ii+1),,TV(ii+1).T(i),T1(ii+1),,TV(0i+1).,,T,T1,,TV.Tj(ii+1)[5].45第1期魏红宁:决策树剪枝方法的比较T(),CCPREP,CCPT(),T.,T(),.2REP(reducederrorpruning)REPQuinlan[6],D.,TS,.SDSD,S.1REP.2(a),2(b)2(c)REP,2(a)1.2REPFig.2ExampleofREPalgorithm1,2.,,.2(b),t40,t8t91.REP,t4,2(c)..REP,.REP(overfitting),,,.,REP.[7].,,.,,,,REP.3PEP(pessimisticerrorpruning)PEPQuinlan[6]REP,.,PEP(continuitycorrection).PEP,e(t)e(Tt)+Se(e(Tt))(2),Tt.(2):e(t)=e(t)+12;(3)e(Tt)=e(i)+Nt2;(4)Se(e(Tt))=e(Tt)n(t)-e(Tt)n(t)12.(5)46西南交通大学学报第40卷:e(t)t;iTt;NtTt;n(t)t.1PEP,2.2PEP1Tab.2ResultsonthedecisiontreeinFig.1afterPEPe(t)e(Tt)Se(e(Tt))t125.582.68t210.552.14t35.531.60t44.541.92t54.510.95PEP,(2),.2,PEP1t4.PEP[2]..,PEP,,.,PEP[2].PEP,.,PEP,.,,.,PEP,,,.4MEP(minimumerrorpruning)MEPNiblettBratko[8],ID3.CestnikBratko[9],m-[10].,m.,m,.,MEP,,Er(t).,Er(Tt),,.Er(t)Er(Tt),;,.,Er(t)[8]:Er(t)=n(t)-nc(t)+(k-1)n(t)+k,(6):n(t)t;nc(t)t;k.(6)MEP,,.1MEP,3.3PEP1Tab.3ResultsonthedecisiontreeinFig.1afterMEPEr(t)Er(Tt)T40.09620.0811T50.41670.1417T20.1740.1467T30.27270.1596T10.31710.1172MEP,,,,.47第1期魏红宁:决策树剪枝方法的比较MEP,Er(t)k.MEP,(paim),Paii.m.m,,.,m,MEP,,m,.,MEP,m.CestnikBratko[9]m.,.,CCP,m.,MEP.5.,,.1,,4.4Tab.4Summaryofpost-pruningalgorithmsCCPREPPEPMEPCV:CVm-(n-)O(n2)O(n)O(n)O(n)1,CCP,REPPEP,,MEP1.,,.,,.,,.:[1]OatesT,JensenD.Theeffectsoftrainingsetsizesondecisiontree[A].Procofthe14thIntlConfonMachineLearning[C].Nashville:MorganKaufman,1997.254-262.[2]BreslowLA,AhaDW.Simplifyingdecisiontrees:asurvey[J].KnowledgeEngineeringReview,1997,12(1):1-40.[3]BreimanL,FriedmanJ,OlshenRA,etal.Classificationandregressiontrees[M].Belmont:Wadsworth,1984.1-358.[4]NobelA.Analysisofacomplexitybasedpruningschemeforclassificationtrees[J].IEEETransactionsonInformationTheory,2002,48(8):2362-2368.[5]MalerbaD,SemeraroG,EspositoF.Choosingthebestpruneddecisiontree:amatterofbias[A].Proc5thItalianWorkshoponMachineLearning[C].Parma:UniversityofParma,1994.33-37.[6]QuinlanJR.Simplifyingdecisiontrees[J].InternationalJournalofMan-MachineStudies,1987,27(3):221-234.[7]ElomaaT,KaariainenM.Ananalysisofreducederrorpruning[J].JournalofArtificialIntelligenceResearch,2001,15:163-187.[8]NiblettT,BratkoI.Learningdecisionrulesinnoisydomains[A].ProceedingsofExpertSystems86[C].Cambridge:CambridgeUninversityPress,1986.25-34.[9]CestnikB,BratkoI.Onestimatingprobabilitiesintreepruning[A].ProcofEuropeanWorkingSessionsonLearning[C].Porto:Springer-Verlag,1991.138-150(中文编辑:唐晴英文编辑:刘斌)48西南交通大学学报第40卷
本文标题:决策树剪枝方法的比较_魏红宁
链接地址:https://www.777doc.com/doc-4328281 .html