Scalable global and local hashing strategies for d

SubmittedtoIEEETransactionsonParallelandDistributedSystems,1994.ScalableGlobalandLocalHashingStrategiesforDuplicatePruninginParallelA*GraphSearch1NiharR.MahapatraandShantanuDuttDepartmentofElectricalEngineering,UniversityofMinnesota,Minneapolis,MN55455PrincipalContact:ShantanuDuttPhone:612-625-0323;Fax:612-625-4583emailaddress:dutt@ee.umn.eduJuly19,1995AbstractFormanyapplicationsoftheA*algorithm,thestatespaceisagraphratherthanatree.TheimplicationofthisforparallelA*algorithmsisthatdierentprocessorsmayperformsignicantduplicatedworkifinter-processorduplicatesarenotpruned.Inthispaper,weconsidertheproblemofduplicatepruninginparallelA*graph-searchalgorithmsimplementedondistributed-memorymachines.Acommonlyusedmethodforduplicatepruningusesahashfunctiontoassociatewitheachdistinctnodeofthesearchspaceaparticularprocessortowhichduplicatenodesarisingindierentprocessorsaretransmittedandtherebypruned.Thisapproachhastwomajordrawbacks.First,loadbalanceisdeterminedsolelybythehashfunction.Second,nodetransmissionsforduplicatepruningareglobal;thiscanleadtohotspotsandslowermessagedelivery.Toovercometheseproblems,weproposetwodierentduplicatepruningstrategies:(1)Toachievegoodloadbalance,wedecouplethetaskofduplicatepruningfromloadbalancing,byusingahashfunctionfortheformerandaloadbalancingschemeforthelatter.(2)Anovelsearch-spacepartitioningschemethatallocatesdisjointpartsofthesearchspacetodisjointsubcubesinahypercube(ordisjointprocessorgroupsinthetargetarchitecture),sothatduplicatepruningisachievedwithonlyintra-subcubeoradjacentinter-subcubecommunication.Thusmessagelatencyandhot-spotprobabilityaregreatlyreduced.TheaboveduplicatepruningschemeswereimplementedonannCUBE2hypercubemulticomputertosolvetheTravelingSalesmanProblem(TSP).Foruniformlydistributedinter-citycosts,ourstrategiesyieldaspeedupimprovementof13-35%on1024processorsoverpreviousmethodsthatdonotpruneanyduplicates,and13-25%overtheprevioushashing-onlyscheme.Fornormallydistributeddatathecorrespondingguresare135%and10-155%.Finally,weanalyzethescalabilityofourparallelA*algorithmsonk-aryn-cubenetworksintermsoftheisoeciencymetric,andshowthattheyhaveisoeciencylowerandupperboundsof(PlogP)and(Pkn2),respectively.Keywords:A*algorithm,branch-and-boundsearch,communicationdelay,duplicatepruning,graphsearch,isoeciencyfunction,k-aryn-cubes,parallelA*,scalability,travelingsalesmanproblem.1ThisresearchwasfundedinpartbyaGrant-in-AidfromtheUniversityofMinnesotaGraduateSchoolandinpartbytheNSFgrantMIP-9210049.SandiaNationalLabsprovidedaccesstotheir1024-processornCUBE2parallelcomputer.1IntroductionTheA*algorithm[24]isawell-known,generalizedbranch-and-boundsearchprocedure,widelyusedinthesolutionofmanycomputationallydemandingcombinatorialoptimizationproblems(COP’s)[27].Itsoperation,asdetailedlater,canbeviewedessentiallyasabest-rstsearchofastate-spacegraph.Parallelizationofbranch-and-boundmethodsprovidesaneectivemeanstomeetthecomputationalneedsofmanypracticalsearchproblems[7].Manyresearchershaveadoptedatreesearch-spaceformulationinproblemsliketheTravel-ingSalesmanProblem(TSP)[12,23],the15-puzzleproblem[12]andthevertexcoverproblem[12],wherethenaturalformulationisagraph.Thisimpliesthatnodesrepresentingidenticalsub-problemsarearticiallydenedtohavedierentstates.Thusidentical-subproblemnodesremainundetectedresultinginduplicationofsearch.Thisleadstoaninecient,althougheasilyparal-lelizable,sequentialA*/best-rstsearchalgorithm:eachprocessorexploresadierentpartofthesearchspacethatisdisjointfromthesearchspacesofotherprocessors.Thusa\misleadinggoodspeedupcanbeobtained.Agraphformulation,ontheotherhand,willenabledetectionofidenticalsubproblemsinA*andhenceavoidduplicatedsearch.However,itisnotaseasilyamenabletoparallelizationondistributed-memorymachinesasatreeformulation,sinceduplicatenodesmaybegeneratedindierentprocessors.Acommonlyusedmethodforpruninginter-processorduplicatesingraphsearch,utilizesasuitablehashfunctiontoassociatean\ownerprocessorwitheachdistinctnodeofthesearchspace.Thenduplicatenodesarisingindierentprocessorsaretransmittedtothesameownerwheretheyarepruned[8,21].Therearetwosignicantshortcomingsinthisapproach.First,loadbalanceisdeterminedsolelybythehashfunctionandhencemaynotbeveryeective.Second,nodetransmissionsforduplicatepruningareglobal;thiscanleadtohighermessagecontention(hotspots)andlatency.Theoverallgoalofourworkistodevelopscalablehigh-performanceparallelA*algorithmsfordistributed-memorymachinesthatcanbeappliedtomostCOP’s.InourpreviousworkonparallelA*,wedevelopedecientloadbalancingstrategies[5,6]thatareequallyusefulforbothtree-searchandgraph-searchproblems,anddemonstratedtheirsuperiorperformanceoverothercompetingmethods[1,11,13,17,25].Inthispaper,wedevelopecientinter-processorduplicatepruningmethods,andincorporatetheminparallelA*algorithmstoobtainhighspeedupoversequentialA*graphsearch.InSec.2,werstdescribeA*andthenpresentanimprovedversionusedinourimplementations.Next,inSec.3weoutlineourapplicationofA*toTSP,thetestproblemusedtodeterminetheecacyofourparallelizationtechniqu

Scalable global and local hashing strategies for d

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

XXXX BP 2030世界能源展望

湖南省湘潭市XXXX年初中生物学竞赛初赛试题

转矩流变仪及其在塑料加工中的应用

广场车展策划书

[中国PPT模板网]物理学的发展

第四章婴儿的认知发展

某公司人力资源管理培训课程

02主流程

项目需求分析与数据库设计

管理学_第五章_组织职能

相关文档

相关搜索