您好,欢迎访问三七文档
AFast,ParallelSpanningTreeAlgorithmforSymmetricMultiprocessors(SMPs)DavidA.Bader∗GuojingCongElectricalandComputerEngineeringDepartmentUniversityofNewMexico,Albuquerque,NM87131{dbader,cong}@ece.unm.eduOctober19,2003AbstractTheabilitytoprovideuniformshared-memoryaccesstoasignificantnumberofprocessorsinasingleSMPnodebringsusmuchclosertotheidealPRAMparallelcomputer.ManyPRAMalgorithmscanbeadaptedtoSMPswithfewmodifications.Yettherearefewstudiesthatdealwiththeimplemen-tationandperformanceissuesofrunningPRAMalgorithmsonSMPs.OurstudyinthispaperfocusesonimplementingparallelspanningtreealgorithmsonSMPs.Spanningtreeisanimportantprobleminthesensethatitisthebuildingblockformanyotherparallelgraphalgorithmsandalsobecause∗ThisworkwassupportedinpartbyNSFGrantsCAREERACI-00-93039,ITRACI-00-81404,DEB-99-10123,ITREIA-01-21377,BiocomplexityDEB-01-20709,andITREF/BIO03-31654.1itisrepresentativeofalargeclassofirregularcombinatorialproblemsthathavesimpleandefficientsequentialimplementationsandfastPRAMalgorithms,butoftenhavenoknownefficientparallelim-plementations.Experimentalstudieshavebeenconductedonrelatedproblems(minimumspanningtreeandconnectedcomponents)usingparallelcomputers,butonlyachievedreasonablespeeduponregulargraphtopologiesthatcanbeimplicitlypartitionedwithgoodlocalityfeaturesoronverydensegraphswithlimitednumbersofvertices.Inthispaperwepresentanewrandomizedalgorithmandimplementationwithsuperiorperformancethatforthefirst-timeachievesparallelspeeduponarbitrarygraphs(bothregularandirregulartopologies)whencomparedwiththebestsequentialimplementationforfindingaspanningtree.Thisnewalgorithmusesseveraltechniquestogiveanexpectedrunningtimethatscaleslinearlywiththenumberpofprocessorsforsuitablylargeinputs(np2).Asthespanningtreeproblemisnotoriouslyhardforanyparallelimplementationtoachievereasonablespeedup,ourstudymayshednewlightonimplementingPRAMalgorithmsforshared-memoryparallelcomputers.Themainresultsofthispaperare1.Anewandpracticalspanningtreealgorithmforsymmetricmultiprocessorsthatexhibitsparallelspeedupsongraphswithregularandirregulartopologies;and2.Anexperimentalstudyofparallelspanningtreealgorithmsthatrevealsthesuperiorperformanceofournewapproachcomparedwiththepreviousalgorithms.Thesourcecodeforthesealgorithmsisfreely-availablefromourwebsitehpc.ece.unm.edu.1IntroductionFindingaspanningtreeofagraphisanimportantbuildingblockformanygraphalgorithms,forexample,biconnectedcomponentsandeardecompositionandcanbeusedingraphplanaritytest-ing.ThebestsequentialalgorithmforfindingaspanningtreeofagraphG=(V,E)wheren=|V|andm=|E|usesdepth-orbreadth-firstgraphtraversal,whosetimecomplexityisO(m+n).Theimplementationofthesequentialalgorithmisveryefficient(lineartimewithaverysmallhiddenconstant),andtheonlydatastructureusedisastackorqueuewhichhasgoodlocalityfeatures.However,graphtraversalusingdepth-firstsearchisinherentlysequentialandnotknowntopar-allelizeefficiently[33].Thus,thepreviousapproachesforparallelspanningtreealgorithmsusenoveltechniquesotherthantraversalthatareconducivetoparallelismandhavepolylogarithmictimecomplexities.Inpractice,noneoftheseparallelalgorithmshasshownsignificantparallelspeedupoverthebestsequentialalgorithmforirregulargraphs,becausethetheoreticmodelsdonotrealisticallycapturethecostforcommunicationoncurrentparallelmachines,thealgorithmistoocomplexforimplementation,ortherearelargeconstantshiddenintheasymptoticnotationthatcouldnotbeovercomebyaparallelimplementation.Symmetricmultiprocessor(SMP)architectures,inwhichseveralprocessorsoperateinatrue,hardware-based,shared-memoryenvironmentarebecomingcommonplace.Indeed,mostofthenewhigh-performancecomputersareclustersofSMPshavingfrom2toover100processorspernode.Theabilitytoprovideuniform-memory-access(UMA)shared-memoryforasignificantnumberofprocessorsbringsusmuchclosertotheidealparallelcomputerenvisionedover20yearsagobytheoreticians,theParallelRandomAccessMachine(PRAM)(see[22,34])andthusmay1enableusatlasttotakeadvantageof20yearsofresearchinPRAMalgorithmsforvariousirregularcomputations(suchasspanningtreeandothergraphalgorithms).Moreover,assupercomputersincreasinglyuseSMPclusters,SMPcomputationswillplayasignificantroleinsupercomputing.WhileanSMPisashared-memoryarchitecture,itisbynomeansthePRAMusedintheoreticalwork—synchronizationcannotbetakenforgrantedandthenumberofprocessorsisfarsmallerthanthatassumedinPRAMalgorithms.ThesignificantfeatureofSMPsisthattheyprovidemuchfasteraccesstotheirshared-memorythananequivalentmessage-basedarchitecture.EventhelargestSMPtodate,therecentlydelivered106-processorSunFireEnterprise15000(E15K),hasaworst-casememoryaccesstimeof450ns(fromanyprocessortoanylocationwithinits576GBmemory);incontrast,thelatencyforaccesstothememoryofanotherprocessorinadistributed-memoryarchitectureismeasuredintensofμs.Inotherwords,message-basedarchitecturesaretwoordersofmagnitudeslowerthanthelargestSMPsintermsoftheirworst-casememoryaccesstimes.TheSunE15K[5,6]usesacombinationofdatacrossbarswitches,multiplesnoopingbuses,andsophisticatedcachehandlingtoachieveUMAacrosstheentirememory.Ofcourse,thererema
本文标题:A Fast, Parallel Spanning Tree Algorithm for Symme
链接地址:https://www.777doc.com/doc-3327964 .html