您好,欢迎访问三七文档
IntroductiontoParallelComputingSolutionManualAnanthGramaAnshulGuptaGeorgeKarypisVipinKumarCopyrightc2003byAsdisonWesleyContentsCHAPTER1Introduction1CHAPTER2ModelsofParallelComputers3CHAPTER3PrinciplesofParallelAlgorithmDesign11CHAPTER4BasicCommunicationOperations13CHAPTER5AnalyticalModelingofParallelPrograms17CHAPTER6ProgrammingUsingtheMessage-PassingParadigm21CHAPTER7ProgrammingSharedAddressSpacePlatforms23CHAPTER8DenseMatrixAlgorithms25CHAPTER9Sorting33CHAPTER10GraphAlgorithms43CHAPTER11SearchAlgorithmsforDiscreteOptimizationProblems51CHAPTER12DynamicProgramming53CHAPTER13FastFourierTransform59Bibliography63iPrefaceThisinstructorsguidetoaccompanythetext”IntroductiontoParallelComputing”containssolutionstoselectedprob-lems.Forsomeproblemsthesolutionhasbeensketched,andthedetailshavebeenleftout.Whensolutionstoproblemsareavailabledirectlyinpublications,referenceshavebeenprovided.Wherenecessary,thesolutionsaresupplementedbyfigures.Figureandequationnumbersarerepresentedinromannumeralstodifferentiatethemfromthefiguresandequationsinthetext.iiiCHAPTER1Introduction1Atthetimeofcompilation(11/02),thefivemostpowerfulcomputersontheTop500listalongwiththeirpeakGFLOPratingsare:1.NECEarth-Simulator/5120,40960.00.2.IBMASCIWhite,SPPower3375MHz/819212288.00.3.LinuxNetworXMCRLinuxClusterXeon2.4GHz-Quadrics/2304,11060.00.4.Hewlett-PackardASCIQ-AlphaServerSCES45/1.25GHz/4096,10240.00.5.Hewlett-PackardASCIQ-AlphaServerSCES45/1.25GHz/409610240.00.2Amongmanyinterestingapplications,herearearepresentativefew:1.Structuralmechanics:crashtestingofautomobiles,simulationofstructuralresponseofbuildingsandbridgestoearthquakesandexplosions,responseofnanoscalecantileverstoverysmallelectromagneticfields.2.Computationalbiology:structureofbiomolecules(proteinfolding,moleculardocking),sequencematch-ingforsimilaritysearchinginbiologicaldatabases,simulationofbiologicalphenomena(vascularflows,impulsepropagationinnervetissue,etc).3.Commercialapplications:transactionprocessing,datamining,scalablewebanddatabaseservers.3Datatoofluidtoplot.4Datatoofluidtoplot.1CHAPTER2ModelsofParallelComputers1Agoodapproximationtothebandwidthcanbeobtainedfromaloopthataddsalargearrayofintegers:for(i=0;i1000000;i++)sum+=a[i];withsumandarrayasuitablyinitialized.Thetimeforthisloopalongwiththesizeofanintegercanbeusedtocomputebandwidth(notethatthiscomputationislargelymemoryboundandthetimeforadditioncanbelargelyignored).ToestimateL1cachesize,writea3-loopmatrixmultiplicationprogram.Plotthecomputationrateofthisprogramasafunctionofmatrixsizen.Fromthisplot,determinesuddendropsinperformance.Thesizeatwhichthesedropsoccur,combinedwiththedatasize(2n2)andwordsizecanbeusedtoestimateL1cachesize.2Thecomputationperforms8FLOPSon2cachelines,i.e.,8FLOPSin200ns.Thiscorrespondstoacompu-tationrateof40MFLOPS.3Inthebestcase,thevectorgetscached.Inthiscase,8FLOPScanbeperformedon1cacheline(forthematrix).Thiscorrespondstoapeakcomputationrateof80MFLOPS(notethatthematrixdoesnotfitinthecache).4Inthiscase,8FLOPScanbeperformedon5cachelines(oneformatrixaandfourforcolumn-majoraccesstomatrixb).Thiscorrespondstoaspeedof16MFLOPS.5Forsamplecodes,seeanySGEMM/DGEMMBLASlibrarysourcecode.6Meanaccesstime=0.8×1+0.1×100+0.8×400≈50ns.Thiscorrespondstoacomputationrateof20MFLOPS(assuming1FLOP/word).Meanaccesstimeforserialcomputation=0.7×1+0.3×100≈30ns.Thiscorrespondstoacomputationrateof33MFLOPS.FractionalCPUrate=20/33≈0.60.7Solutionintext.8Scalingtheswitchwhilemaintainingthroughputismajorchallenge.ThecomplexityoftheswitchisO(p2).9CRCWPRAMisthemostpowerfulbecauseitcanemulateothermodelswithoutanyperformanceoverhead.Thereverseisnottrue.10Weillustratetheequivalenceofabutterflyandanomeganetworkforan8-inputnetworkbyrearrangingtheswitchesofanomeganetworksothatitlookslikeabutterflynetworkThisisshowninFigure2.1[Lei92a].34ModelsofParallelComputers111,0110,0101,0100,0011,0010,0001,0000,0000,1010,1100,1110,1001,1011,1101,1111,1000,2100,2001,2101,2010,2110,2011,2111,2000,3001,3010,3011,3100,3101,3110,3111,3Figure2.1An8-inputomeganetworkredrawntolooklikeabutterflynetwork.Nodei,l(nodeiatlevell)isidenticaltonodej,linthebutterflynetwork,wherejisobtainedbyrightcircularshiftingthebinaryrepresentationofiltimes.12ConsideracycleA1,A2,...,Akinahypercube.AswetravelfromnodeAitoAi+1,thenumberofonesintheprocessorlabel(thatis,theparity)mustchange.SinceA1=Ak,thenumberofparitychangesmustbeeven.Therefore,therecanbenocyclesofoddlengthinahypercube.(ProofadaptedfromSaadandShultz[SS88]).13Considera2dprocessorhypercube.Byfixingkofthedbitsintheprocessorlabel,wecanchangetheremainingd−kbits.Thereare2d−kdistinctprocessorsthathaveidenticalvaluesattheremainingkbitpositions.Ap-processorhypercubehasthepropertythateveryprocessorhaslogpcommunicationlinks,oneeachtoaprocessorwhoselabeldiffersinonebitposition.Toprovethatthe2d−kprocessorsareconnectedinahypercubetopology,weneedtoprovethateachprocessorinagrouphasd−kcommunicationlinksgoingtootherprocessorsinthesamegroup.Sincetheselecteddbitsarefixedforeachprocessorintheg
本文标题:Solution-Manual-for-Introduction-to-Parallel-Compu
链接地址:https://www.777doc.com/doc-4431076 .html