您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 项目/工程管理 > Top-10-algorithms-in-data-mining
KnowlInfSyst(2008)14:1–37DOI10.1007/s10115-007-0114-2SURVEYPAPERTop10algorithmsindataminingXindongWu·VipinKumar·J.RossQuinlan·JoydeepGhosh·QiangYang·HiroshiMotoda·GeoffreyJ.McLachlan·AngusNg·BingLiu·PhilipS.Yu·Zhi-HuaZhou·MichaelSteinbach·DavidJ.Hand·DanSteinbergReceived:9July2007/Revised:28September2007/Accepted:8October2007Publishedonline:4December2007©Springer-VerlagLondonLimited2007AbstractThispaperpresentsthetop10dataminingalgorithmsidentifiedbytheIEEEInternationalConferenceonDataMining(ICDM)inDecember2006:C4.5,k-Means,SVM,Apriori,EM,PageRank,AdaBoost,kNN,NaiveBayes,andCART.Thesetop10algorithmsareamongthemostinfluentialdataminingalgorithmsintheresearchcommunity.Witheachalgorithm,weprovideadescriptionofthealgorithm,discusstheimpactofthealgorithm,andreviewcurrentandfurtherresearchonthealgorithm.These10algorithmscoverclassification,X.Wu(B)DepartmentofComputerScience,UniversityofVermont,Burlington,VT,USAe-mail:xwu@cs.uvm.eduV.KumarDepartmentofComputerScienceandEngineering,UniversityofMinnesota,Minneapolis,MN,USAe-mail:kumar@cs.umn.eduJ.RossQuinlanRulequestResearchPtyLtd,StIves,NSW,Australiae-mail:quinlan@rulequest.comJ.GhoshDepartmentofElectricalandComputerEngineering,UniversityofTexasatAustin,Austin,TX78712,USAe-mail:ghosh@ece.utexas.eduQ.YangDepartmentofComputerScience,HongKongUniversityofScienceandTechnology,Honkong,Chinae-mail:qyang@cs.ust.hkH.MotodaAFOSR/AOARDandOsakaUniversity,7-23-17Roppongi,Minato-ku,Tokyo106-0032,Japane-mail:motoda@ar.sanken.osaka-u.ac.jp1232X.Wuetal.clustering,statisticallearning,associationanalysis,andlinkmining,whichareallamongthemostimportanttopicsindataminingresearchanddevelopment.0IntroductionInanefforttoidentifysomeofthemostinfluentialalgorithmsthathavebeenwidelyusedinthedataminingcommunity,theIEEEInternationalConferenceonDataMining(ICDM,~icdm/)identifiedthetop10algorithmsindataminingforpresen-tationatICDM’06inHongKong.Asthefirststepintheidentificationprocess,inSeptember2006weinvitedtheACMKDDInnovationAwardandIEEEICDMResearchContributionsAwardwinnerstoeachnomi-nateupto10best-knownalgorithmsindatamining.Allexceptoneinthisdistinguishedsetofawardwinnersrespondedtoourinvitation.Weaskedeachnominationtoprovidethefollowinginformation:(a)thealgorithmname,(b)abriefjustification,and(c)arepresenta-tivepublicationreference.Wealsoadvisedthateachnominatedalgorithmshouldhavebeenwidelycitedandusedbyotherresearchersinthefield,andthenominationsfromeachnomi-natorasagroupshouldhaveareasonablerepresentationofthedifferentareasindatamining.G.J.McLachlanDepartmentofMathematics,TheUniversityofQueensland,Brisbane,Australiae-mail:gjm@maths.uq.edu.auA.NgSchoolofMedicine,GriffithUniversity,Brisbane,AustraliaB.LiuDepartmentofComputerScience,UniversityofIllinoisatChicago,Chicago,IL60607,USAP.S.YuIBMT.J.WatsonResearchCenter,Hawthorne,NY10532,USAe-mail:psyu@us.ibm.comZ.-H.ZhouNationalKeyLaboratoryforNovelSoftwareTechnology,NanjingUniversity,Nanjing210093,Chinae-mail:zhouzh@nju.edu.cnM.SteinbachDepartmentofComputerScienceandEngineering,UniversityofMinnesota,Minneapolis,MN55455,USAe-mail:steinbac@cs.umn.eduD.J.HandDepartmentofMathematics,ImperialCollege,London,UKe-mail:d.j.hand@imperial.ac.ukD.SteinbergSalfordSystems,SanDiego,CA92123,USAe-mail:dsx@salford-systems.com123Top10algorithmsindatamining3AfterthenominationsinStep1,weverifiedeachnominationforitscitationsonGoogleScholarinlateOctober2006,andremovedthosenominationsthatdidnothaveatleast50citations.Allremaining(18)nominationswerethenorganizedin10topics:associationanal-ysis,classification,clustering,statisticallearning,baggingandboosting,sequentialpatterns,integratedmining,roughsets,linkmining,andgraphmining.Forsomeofthese18algorithmssuchask-means,therepresentativepublicationwasnotnecessarilytheoriginalpaperthatintroducedthealgorithm,butarecentpaperthathighlightstheimportanceofthetechnique.TheserepresentativepublicationsareavailableattheICDMwebsite(~icdm/algorithms/CandidateList.shtml).Inthethirdstepoftheidentificationprocess,wehadawiderinvolvementoftheresearchcommunity.WeinvitedtheProgramCommitteemembersofKDD-06(the2006ACMSIG-KDDInternationalConferenceonKnowledgeDiscoveryandDataMining),ICDM’06(the2006IEEEInternationalConferenceonDataMining),andSDM’06(the2006SIAMInter-nationalConferenceonDataMining),aswellastheACMKDDInnovationAwardandIEEEICDMResearchContributionsAwardwinnerstoeachvoteforupto10well-knownalgo-rithmsfromthe18-algorithmcandidatelist.ThevotingresultsofthisstepwerepresentedattheICDM’06panelonTop10AlgorithmsinDataMining.AttheICDM’06panelofDecember21,2006,wealsotookanopenvotewithall145attendeesonthetop10algorithmsfromtheabove18-algorithmcandidatelist,andthetop10algorithmsfromthisopenvotewerethesameasthevotingresultsfromtheabovethirdstep.The3-hourpanelwasorganizedasthelastsessionoftheICDM’06conference,inparallelwith7paperpresentationsessionsoftheWebIntelligence(WI’06)andIntelligentAgentTechnology(IAT’06)conferencesatthesamelocation,andattracting145participantstothispanelclearlyshowedthatthepanelwasagreatsuccess.1C4.5a
本文标题:Top-10-algorithms-in-data-mining
链接地址:https://www.777doc.com/doc-3743390 .html