您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 信息化管理 > Experiments with a New Boosting Algorithm
1、MachineLearning:ProceedingsoftheThirteenthInternationalConference,1996.ExperimentswithaNewBoostingAlgorithmYoavFreundRobertE.SchapireAT&TLaboratories600MountainAvenueMurrayHill,NJ07974-0636 yoav,schapire @research.att.comAbstract.Inanearlierpaper,weintroducedanew“boosting”algorithmcalledAdaBoostwhich,theoretically,canbeusedtosignificantlyreducetheerrorofanylearningalgorithmthatcon-sistentlygeneratesclassifierswhoseperformanceisalittlebetterthanrandomguessing.Wealsointroducedtherelatednotionofa“ps。
2、eudo-loss”whichisamethodforforcingalearningalgorithmofmulti-labelconceptstoconcentrateonthelabelsthatarehardesttodiscriminate.Inthispaper,wedescribeexperimentswecarriedouttoassesshowwellAdaBoostwithandwithoutpseudo-loss,performsonreallearningproblems.Weperformedtwosetsofexperiments.ThefirstsetcomparedboostingtoBreiman’s“bagging”methodwhenusedtoaggregatevariousclassifiers(includingdecisiontreesandsingleattribute-valuetests).Wecomparedtheperformanceofthetwomethodsonacollectionofmachine-learningbench。
3、marks.Inthesecondsetofexperiments,westudiedinmoredetailtheperformanceofboostingusinganearest-neighborclassifieronanOCRproblem.1INTRODUCTION“Boosting”isageneralmethodforimprovingtheperfor-manceofanylearningalgorithm.Intheory,boostingcanbeusedtosignificantlyreducetheerrorofany“weak”learningalgorithmthatconsistentlygeneratesclassifierswhichneedonlybealittlebitbetterthanrandomguessing.Despitethepotentialbenefitsofboostingpromisedbythetheoret-icalresults,thetruepracticalvalueofboostingcanonlybeassessedby。
4、testingthemethodonrealmachinelearningproblems.Inthispaper,wepresentsuchanexperimentalassessmentofanewboostingalgorithmcalledAdaBoost.Boostingworksbyrepeatedlyrunningagivenweak1learningalgorithmonvariousdistributionsoverthetrain-ingdata,andthencombiningtheclassifiersproducedbytheweaklearnerintoasinglecompositeclassifier.ThefirstprovablyeffectiveboostingalgorithmswerepresentedbySchapire[20]andFreund[9].Morerecently,wede-scribedandanalyzedAdaBoost,andwearguedthatthisnewboostingalgorithmhascertainprope。
5、rtieswhichmakeitmorepracticalandeasiertoimplementthanitsprede-cessors[10].Thisalgorithm,whichweusedinallourexperiments,isdescribedindetailinSection2.Homepage:“”.Expectedtochangeto“˜uid”some-timeinthenearfuture(foruid yoav,schapire ).1Weusetheterm“weak”learningalgorithm,eventhough,inpractice,boostingmightbecombinedwithaquitestronglearningalgorithmsuchasC4.5.Thispaperdescribestwodistinctsetsofexperiments.Inthefirstsetofexperiments,describedinSection3,wecomparedboostingto“bagging,”amethoddescribed。
6、byBreiman[1]whichworksinthesamegeneralfashion(i.e.,byrepeatedlyrerunningagivenweaklearningalgorithm,andcombiningthecomputedclassifiers),butwhichcon-structseachdistributioninasimplermanner.(Detailsgivenbelow.)Wecomparedboostingwithbaggingbecausebothmethodsworkbycombiningmanyclassifiers.Thiscom-parisonallowsustoseparateouttheeffectofmodifyingthedistributiononeachround(whichisdonedifferentlybyeachalgorithm)fromtheeffectofvotingmultipleclassifiers(whichisdonethesamebyeach).Inourexperiments,wecomparedbo。
7、ostingtobaggingusinganumberofdifferentweaklearningalgorithmsofvaryinglevelsofsophistication.Theseinclude:(1)analgorithmthatsearchesforverysimplepredictionruleswhichtestonasingleattribute(similartoHolte’sverysim-pleclassificationrules[14]);(2)analgorithmthatsearchesforasinglegooddecisionrulethattestsonaconjunctionofattributetests(similarinflavortotherule-formationpartofCohen’sRIPPERalgorithm[3]andF¨urnkranzandWidmer’sIREPalgorithm[11]);and(3)Quinlan’sC4.5decision-treealgorithm[18].Wetestedthesealgo。
8、rithmsonacollectionof27benchmarklearningproblemstakenfromtheUCIrepository.Themainconclusionofourexperimentsisthatboost-ingperformssignificantlyanduniformlybetterthanbag-gingwhentheweaklearningalgorithmgeneratesfairlysimpleclassifiers(algorithms(1)and(2)above).WhencombinedwithC4.5,boostingstillseemstooutperformbaggingslightly,buttheresultsarelesscompelling.Wealsofoundthatboostingcanbeusedwithverysim-plerules(algorithm(1))toconstructclassifiersthatarequitegoodrelative,say,toC4.5.KearnsandMansour[16]a。
9、rguethatC4.5canitselfbeviewedasakindofboostingalgo-rithm,soacomparisonofAdaBoostandC4.5canbeseenasacomparisonoftwocompetingboostingalgorithms.SeeDietterich,KearnsandMansour’spaper[4]formoredetailonthispoint.Inthesecondsetofexperiments,wetesttheperfor-manceofboostingonanearestneighborclassifierforhand-writtendigitrecognition.Inthiscasetheweaklearningalgorithmisverysimple,andthisletsusgainsomeinsightintotheinteractionbetweentheboostingalgorithmandthenearestneighborclassifier.Weshowthattheboostingal-。
10、gorithmisaneffectivewayforfindingasmallsubsetofprototypesthatperformsalmostaswellasthecompleteset.WealsoshowthatitcomparesfavorablytothestandardmethodofCondensedNearestNeighbor[13]intermsofitstesterror.Thereseemtobetwoseparatereasonsfortheimprove-mentinperformancethatisachievedbyboosting.Thefirstandbetterunderstoodeffectofboostingisthatitgeneratesahypothesiswhoseerroronthetrainingsetissmallbycom-biningmanyhypotheseswhoseerro。
本文标题:Experiments with a New Boosting Algorithm
链接地址:https://www.777doc.com/doc-4425344 .html