A Fast, Parallel Spanning Tree Algorithm for Symme

AFast,ParallelSpanningTreeAlgorithmforSymmetricMultiprocessors(SMPs)DavidA.Bader∗GuojingCongElectricalandComputerEngineeringDepartmentUniversityofNewMexico,Albuquerque,NM87131{dbader,cong}@ece.unm.eduOctober19,2003AbstractTheabilitytoprovideuniformshared-memoryaccesstoasigniﬁcantnumberofprocessorsinasingleSMPnodebringsusmuchclosertotheidealPRAMparallelcomputer.ManyPRAMalgorithmscanbeadaptedtoSMPswithfewmodiﬁcations.Yettherearefewstudiesthatdealwiththeimplemen-tationandperformanceissuesofrunningPRAMalgorithmsonSMPs.OurstudyinthispaperfocusesonimplementingparallelspanningtreealgorithmsonSMPs.Spanningtreeisanimportantprobleminthesensethatitisthebuildingblockformanyotherparallelgraphalgorithmsandalsobecause∗ThisworkwassupportedinpartbyNSFGrantsCAREERACI-00-93039,ITRACI-00-81404,DEB-99-10123,ITREIA-01-21377,BiocomplexityDEB-01-20709,andITREF/BIO03-31654.1itisrepresentativeofalargeclassofirregularcombinatorialproblemsthathavesimpleandefﬁcientsequentialimplementationsandfastPRAMalgorithms,butoftenhavenoknownefﬁcientparallelim-plementations.Experimentalstudieshavebeenconductedonrelatedproblems(minimumspanningtreeandconnectedcomponents)usingparallelcomputers,butonlyachievedreasonablespeeduponregulargraphtopologiesthatcanbeimplicitlypartitionedwithgoodlocalityfeaturesoronverydensegraphswithlimitednumbersofvertices.Inthispaperwepresentanewrandomizedalgorithmandimplementationwithsuperiorperformancethatfortheﬁrst-timeachievesparallelspeeduponarbitrarygraphs(bothregularandirregulartopologies)whencomparedwiththebestsequentialimplementationforﬁndingaspanningtree.Thisnewalgorithmusesseveraltechniquestogiveanexpectedrunningtimethatscaleslinearlywiththenumberpofprocessorsforsuitablylargeinputs(np2).Asthespanningtreeproblemisnotoriouslyhardforanyparallelimplementationtoachievereasonablespeedup,ourstudymayshednewlightonimplementingPRAMalgorithmsforshared-memoryparallelcomputers.Themainresultsofthispaperare1.Anewandpracticalspanningtreealgorithmforsymmetricmultiprocessorsthatexhibitsparallelspeedupsongraphswithregularandirregulartopologies;and2.Anexperimentalstudyofparallelspanningtreealgorithmsthatrevealsthesuperiorperformanceofournewapproachcomparedwiththepreviousalgorithms.Thesourcecodeforthesealgorithmsisfreely-availablefromourwebsitehpc.ece.unm.edu.1IntroductionFindingaspanningtreeofagraphisanimportantbuildingblockformanygraphalgorithms,forexample,biconnectedcomponentsandeardecompositionandcanbeusedingraphplanaritytest-ing.ThebestsequentialalgorithmforﬁndingaspanningtreeofagraphG=(V,E)wheren=|V|andm=|E|usesdepth-orbreadth-ﬁrstgraphtraversal,whosetimecomplexityisO(m+n).Theimplementationofthesequentialalgorithmisveryefﬁcient(lineartimewithaverysmallhiddenconstant),andtheonlydatastructureusedisastackorqueuewhichhasgoodlocalityfeatures.However,graphtraversalusingdepth-ﬁrstsearchisinherentlysequentialandnotknowntopar-allelizeefﬁciently[33].Thus,thepreviousapproachesforparallelspanningtreealgorithmsusenoveltechniquesotherthantraversalthatareconducivetoparallelismandhavepolylogarithmictimecomplexities.Inpractice,noneoftheseparallelalgorithmshasshownsigniﬁcantparallelspeedupoverthebestsequentialalgorithmforirregulargraphs,becausethetheoreticmodelsdonotrealisticallycapturethecostforcommunicationoncurrentparallelmachines,thealgorithmistoocomplexforimplementation,ortherearelargeconstantshiddenintheasymptoticnotationthatcouldnotbeovercomebyaparallelimplementation.Symmetricmultiprocessor(SMP)architectures,inwhichseveralprocessorsoperateinatrue,hardware-based,shared-memoryenvironmentarebecomingcommonplace.Indeed,mostofthenewhigh-performancecomputersareclustersofSMPshavingfrom2toover100processorspernode.Theabilitytoprovideuniform-memory-access(UMA)shared-memoryforasigniﬁcantnumberofprocessorsbringsusmuchclosertotheidealparallelcomputerenvisionedover20yearsagobytheoreticians,theParallelRandomAccessMachine(PRAM)(see[22,34])andthusmay1enableusatlasttotakeadvantageof20yearsofresearchinPRAMalgorithmsforvariousirregularcomputations(suchasspanningtreeandothergraphalgorithms).Moreover,assupercomputersincreasinglyuseSMPclusters,SMPcomputationswillplayasigniﬁcantroleinsupercomputing.WhileanSMPisashared-memoryarchitecture,itisbynomeansthePRAMusedintheoreticalwork—synchronizationcannotbetakenforgrantedandthenumberofprocessorsisfarsmallerthanthatassumedinPRAMalgorithms.ThesigniﬁcantfeatureofSMPsisthattheyprovidemuchfasteraccesstotheirshared-memorythananequivalentmessage-basedarchitecture.EventhelargestSMPtodate,therecentlydelivered106-processorSunFireEnterprise15000(E15K),hasaworst-casememoryaccesstimeof450ns(fromanyprocessortoanylocationwithinits576GBmemory);incontrast,thelatencyforaccesstothememoryofanotherprocessorinadistributed-memoryarchitectureismeasuredintensofμs.Inotherwords,message-basedarchitecturesaretwoordersofmagnitudeslowerthanthelargestSMPsintermsoftheirworst-casememoryaccesstimes.TheSunE15K[5,6]usesacombinationofdatacrossbarswitches,multiplesnoopingbuses,andsophisticatedcachehandlingtoachieveUMAacrosstheentirememory.Ofcourse,thererema

A Fast, Parallel Spanning Tree Algorithm for Symme

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

Excel电子表格操作题

4建筑防火与安全疏散

4 第四周铁路运输

硕士论文-基于需求弹性分析的公共交通吸引力研究

东白山生态旅游区规划书

aps_1129_酒店的财务成本管理

湖北移动神州行品牌传播策划案-55ppt（PPT55页）

蜀都花园预售前期推广案

08年经营工作总结

北台钢铁集团发展战略研究

相关文档

相关搜索

A Fast, Parallel Spanning Tree Algorithm for Symme

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

Excel电子表格操作题

4建筑防火与安全疏散

4 第四周 铁路运输

硕士论文-基于需求弹性分析的公共交通吸引力研究

东白山生态旅游区规划书

aps_1129_酒店的财务成本管理

湖北移动神州行品牌传播策划案-55ppt（PPT55页）

蜀都花园预售前期推广案

08年经营工作总结

北台钢铁集团发展战略研究

相关文档

相关搜索

4 第四周铁路运输