Solution-Manual-for-Introduction-to-Parallel-Compu

IntroductiontoParallelComputingSolutionManualAnanthGramaAnshulGuptaGeorgeKarypisVipinKumarCopyrightc2003byAsdisonWesleyContentsCHAPTER1Introduction1CHAPTER2ModelsofParallelComputers3CHAPTER3PrinciplesofParallelAlgorithmDesign11CHAPTER4BasicCommunicationOperations13CHAPTER5AnalyticalModelingofParallelPrograms17CHAPTER6ProgrammingUsingtheMessage-PassingParadigm21CHAPTER7ProgrammingSharedAddressSpacePlatforms23CHAPTER8DenseMatrixAlgorithms25CHAPTER9Sorting33CHAPTER10GraphAlgorithms43CHAPTER11SearchAlgorithmsforDiscreteOptimizationProblems51CHAPTER12DynamicProgramming53CHAPTER13FastFourierTransform59Bibliography63iPrefaceThisinstructorsguidetoaccompanythetext”IntroductiontoParallelComputing”containssolutionstoselectedprob-lems.Forsomeproblemsthesolutionhasbeensketched,andthedetailshavebeenleftout.Whensolutionstoproblemsareavailabledirectlyinpublications,referenceshavebeenprovided.Wherenecessary,thesolutionsaresupplementedbyﬁgures.Figureandequationnumbersarerepresentedinromannumeralstodifferentiatethemfromtheﬁguresandequationsinthetext.iiiCHAPTER1Introduction1Atthetimeofcompilation(11/02),theﬁvemostpowerfulcomputersontheTop500listalongwiththeirpeakGFLOPratingsare:1.NECEarth-Simulator/5120,40960.00.2.IBMASCIWhite,SPPower3375MHz/819212288.00.3.LinuxNetworXMCRLinuxClusterXeon2.4GHz-Quadrics/2304,11060.00.4.Hewlett-PackardASCIQ-AlphaServerSCES45/1.25GHz/4096,10240.00.5.Hewlett-PackardASCIQ-AlphaServerSCES45/1.25GHz/409610240.00.2Amongmanyinterestingapplications,herearearepresentativefew:1.Structuralmechanics:crashtestingofautomobiles,simulationofstructuralresponseofbuildingsandbridgestoearthquakesandexplosions,responseofnanoscalecantileverstoverysmallelectromagneticﬁelds.2.Computationalbiology:structureofbiomolecules(proteinfolding,moleculardocking),sequencematch-ingforsimilaritysearchinginbiologicaldatabases,simulationofbiologicalphenomena(vascularﬂows,impulsepropagationinnervetissue,etc).3.Commercialapplications:transactionprocessing,datamining,scalablewebanddatabaseservers.3Datatooﬂuidtoplot.4Datatooﬂuidtoplot.1CHAPTER2ModelsofParallelComputers1Agoodapproximationtothebandwidthcanbeobtainedfromaloopthataddsalargearrayofintegers:for(i=0;i1000000;i++)sum+=a[i];withsumandarrayasuitablyinitialized.Thetimeforthisloopalongwiththesizeofanintegercanbeusedtocomputebandwidth(notethatthiscomputationislargelymemoryboundandthetimeforadditioncanbelargelyignored).ToestimateL1cachesize,writea3-loopmatrixmultiplicationprogram.Plotthecomputationrateofthisprogramasafunctionofmatrixsizen.Fromthisplot,determinesuddendropsinperformance.Thesizeatwhichthesedropsoccur,combinedwiththedatasize(2n2)andwordsizecanbeusedtoestimateL1cachesize.2Thecomputationperforms8FLOPSon2cachelines,i.e.,8FLOPSin200ns.Thiscorrespondstoacompu-tationrateof40MFLOPS.3Inthebestcase,thevectorgetscached.Inthiscase,8FLOPScanbeperformedon1cacheline(forthematrix).Thiscorrespondstoapeakcomputationrateof80MFLOPS(notethatthematrixdoesnotﬁtinthecache).4Inthiscase,8FLOPScanbeperformedon5cachelines(oneformatrixaandfourforcolumn-majoraccesstomatrixb).Thiscorrespondstoaspeedof16MFLOPS.5Forsamplecodes,seeanySGEMM/DGEMMBLASlibrarysourcecode.6Meanaccesstime=0.8×1+0.1×100+0.8×400≈50ns.Thiscorrespondstoacomputationrateof20MFLOPS(assuming1FLOP/word).Meanaccesstimeforserialcomputation=0.7×1+0.3×100≈30ns.Thiscorrespondstoacomputationrateof33MFLOPS.FractionalCPUrate=20/33≈0.60.7Solutionintext.8Scalingtheswitchwhilemaintainingthroughputismajorchallenge.ThecomplexityoftheswitchisO(p2).9CRCWPRAMisthemostpowerfulbecauseitcanemulateothermodelswithoutanyperformanceoverhead.Thereverseisnottrue.10Weillustratetheequivalenceofabutterﬂyandanomeganetworkforan8-inputnetworkbyrearrangingtheswitchesofanomeganetworksothatitlookslikeabutterﬂynetworkThisisshowninFigure2.1[Lei92a].34ModelsofParallelComputers111,0110,0101,0100,0011,0010,0001,0000,0000,1010,1100,1110,1001,1011,1101,1111,1000,2100,2001,2101,2010,2110,2011,2111,2000,3001,3010,3011,3100,3101,3110,3111,3Figure2.1An8-inputomeganetworkredrawntolooklikeabutterﬂynetwork.Nodei,l(nodeiatlevell)isidenticaltonodej,linthebutterﬂynetwork,wherejisobtainedbyrightcircularshiftingthebinaryrepresentationofiltimes.12ConsideracycleA1,A2,...,Akinahypercube.AswetravelfromnodeAitoAi+1,thenumberofonesintheprocessorlabel(thatis,theparity)mustchange.SinceA1=Ak,thenumberofparitychangesmustbeeven.Therefore,therecanbenocyclesofoddlengthinahypercube.(ProofadaptedfromSaadandShultz[SS88]).13Considera2dprocessorhypercube.Byﬁxingkofthedbitsintheprocessorlabel,wecanchangetheremainingd−kbits.Thereare2d−kdistinctprocessorsthathaveidenticalvaluesattheremainingkbitpositions.Ap-processorhypercubehasthepropertythateveryprocessorhaslogpcommunicationlinks,oneeachtoaprocessorwhoselabeldiffersinonebitposition.Toprovethatthe2d−kprocessorsareconnectedinahypercubetopology,weneedtoprovethateachprocessorinagrouphasd−kcommunicationlinksgoingtootherprocessorsinthesamegroup.Sincetheselecteddbitsareﬁxedforeachprocessorintheg

Solution-Manual-for-Introduction-to-Parallel-Compu

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

延迟制造策略及其供应链运作

开兴美欣达信息化建设方案

18汽车的晶闸管电子点火器电路

中国机械工业发展现展望

过程装备与控制工程专业超详细深度剖析

解毒药doc

降压药

黄石理工学院集邮协会章程

美容院经营店务管理手册

卫生礼仪知识

相关文档

相关搜索