您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 质量控制/管理 > Basic Communication Operations
BasicCommunicationOperationsAnanthGrama,AnshulGupta,GeorgeKarypis,andVipinKumarToaccompanythetext``IntroductiontoParallelComputing'',AddisonWesley,2003TopicOverview•One-to-AllBroadcastandAll-to-OneReduction•All-to-AllBroadcastandReduction•All-ReduceandPrefix-SumOperations•ScatterandGather•All-to-AllPersonalizedCommunication•CircularShift•ImprovingtheSpeedofSomeCommunicationOperationsBasicCommunicationOperations:Introduction•Manyinteractionsinpracticalparallelprogramsoccurinwell-definedpatternsinvolvinggroupsofprocessors.•Efficientimplementationsoftheseoperationscanimproveperformance,reducedevelopmenteffortandcost,andimprovesoftwarequality.•Efficientimplementationsmustleverageunderlyingarchitecture.Forthisreason,werefertospecificarchitectureshere.•Weselectadescriptivesetofarchitecturestoillustratetheprocessofalgorithmdesign.BasicCommunicationOperations:Introduction•Groupcommunicationoperationsarebuiltusingpoint-to-pointmessagingprimitives.•Recallfromourdiscussionofarchitecturesthatcommunicatingamessageofsizemoveranuncongestednetworktakestimets+tmw.•Weusethisasthebasisforouranalyses.Wherenecessary,wetakecongestionintoaccountexplicitlybyscalingthetwterm.•Weassumethatthenetworkisbidirectionalandthatcommunicationissingle-ported.One-to-AllBroadcastandAll-to-OneReduction•Oneprocessorhasapieceofdata(ofsizem)itneedstosendtoeveryone.•Thedualofone-to-allbroadcastisall-to-onereduction.•Inall-to-onereduction,eachprocessorhasmunitsofdata.Thesedataitemsmustbecombinedpiece-wise(usingsomeassociativeoperator,suchasadditionormin),andtheresultmadeavailableatatargetprocessor.One-to-AllBroadcastandAll-to-OneReductionOne-to-allbroadcastandall-to-onereductionamongprocessors.One-to-AllBroadcastandAll-to-OneReductiononRings•Simplestwayistosendp-1messagesfromthesourcetotheotherp-1processors-thisisnotveryefficient.•Userecursivedoubling:sourcesendsamessagetoaselectedprocessor.Wenowhavetwoindependentproblemsderinedoverhalvesofmachines.•Reductioncanbeperformedinanidenticalfashionbyinvertingtheprocess.One-to-AllBroadcastOne-to-allbroadcastonaneight-nodering.Node0isthesourceofthebroadcast.Eachmessagetransferstepisshownbyanumbered,dottedarrowfromthesourceofthemessagetoitsdestination.Thenumberonanarrowindicatesthetimestepduringwhichthemessageistransferred.All-to-OneReductionReductiononaneight-noderingwithnode0asthedestinationofthereduction.BroadcastandReduction:ExampleConsidertheproblemofmultiplyingamatrixwithavector.•Thenxnmatrixisassignedtoannxn(virtual)processorgrid.Thevectorisassumedtobeonthefirstrowofprocessors.•Thefirststepoftheproductrequiresaone-to-allbroadcastofthevectorelementalongthecorrespondingcolumnofprocessors.Thiscanbedoneconcurrentlyforallncolumns.•Theprocessorscomputelocalproductofthevectorelementandthelocalmatrixentry.•Inthefinalstep,theresultsoftheseproductsareaccumulatedtothefirstrowusingnconcurrentall-to-onereductionoperationsalongthecolumns(usingthesumoperation).BroadcastandReduction:Matrix-VectorMultiplicationExampleOne-to-allbroadcastandall-to-onereductioninthemultiplicationofa4x4matrixwitha4x1vector.BroadcastandReductiononaMesh•Wecanvieweachrowandcolumnofasquaremeshofpnodesasalineararrayof√pnodes.•Broadcastandreductionoperationscanbeperformedintwosteps-thefirststepdoestheoperationalongarowandthesecondstepalongeachcolumnconcurrently.•Thisprocessgeneralizestohigherdimensionsaswell.BroadcastandReductiononaMesh:ExampleOne-to-allbroadcastona16-nodemesh.BroadcastandReductiononaHypercube•Ahypercubewith2dnodescanberegardedasad-dimensionalmeshwithtwonodesineachdimension.•Themeshalgorithmcanbegeneralizedtoahypercubeandtheoperationiscarriedoutind(=logp)steps.BroadcastandReductiononaHypercube:ExampleOne-to-allbroadcastonathree-dimensionalhypercube.Thebinaryrepresentationsofnodelabelsareshowninparentheses.BroadcastandReductiononaBalancedBinaryTree•Considerabinarytreeinwhichprocessorsare(logically)attheleavesandinternalnodesareroutingnodes.•Assumethatsourceprocessoristherootofthistree.Inthefirststep,thesourcesendsthedatatotherightchild(assumingthesourceisalsotheleftchild).Theproblemhasnowbeendecomposedintotwoproblemswithhalfthenumberofprocessors.BroadcastandReductiononaBalancedBinaryTreeOne-to-allbroadcastonaneight-nodetree.BroadcastandReductionAlgorithms•Allofthealgorithmsdescribedaboveareadaptationsofthesamealgorithmictemplate.•Weillustratethealgorithmforahypercube,butthealgorithm,ashasbeenseen,canbeadaptedtootherarchitectures.•Thehypercubehas2dnodesandmy_idisthelabelforanode.•Xisthemessagetobebroadcast,whichinitiallyresidesatthesourcenode0.BroadcastandReductionAlgorithmsOne-to-allbroadcastofamessageXfromsourceonahypercube.BroadcastandReductionAlgorithmsSingle-nodeaccumulationonad-dimensionalhypercube.EachnodecontributesamessageXcontainingmwords,andnode0isthedestination.CostAnalysis•Thebroadcastorreductionprocedureinvolveslogppoint-to-pointsimplemessagetransfers,eachatatimecostofts+twm.•Thetotaltimeisthereforegivenby:All-to-AllBroadcastandReduction•Generalizationofbroadcastinwhicheachprocessoristhesourceaswellasdestination.•Apr
本文标题:Basic Communication Operations
链接地址:https://www.777doc.com/doc-4282431 .html