您好,欢迎访问三七文档
IntroductiontoParallelProgrammingMapReduceExceptwhereotherwisenotedallportionsofthisworkareCopyright(c)2007GoogleandarelicensedundertheCreativeCommonsAttribution3.0License•Intheearlydaysofcomputing,programswereserial,thatis,aprogramconsistedofasequenceofinstructions,whereeachinstructionexecutedoneaftertheother.Itranfromstarttofinishonasingleprocessor.Serialvs.ParallelProgramming•Parallelprogrammingdevelopedasameansofimprovingperformanceandefficiency.•Inaparallelprogram,theprocessingisbrokenupintoparts,eachofwhichcanbeexecutedconcurrently.•TheinstructionsfromeachpartrunsimultaneouslyondifferentCPUs.•TheseCPUscanexistonasinglemachine,ortheycanbeCPUsinasetofcomputersconnectedviaanetwork.Whydoit?•Notonlyareparallelprogramsfaster,theycanalsobeusedtosolveproblemsonlargedatasetsusingnon-localresources.•Whenyouhaveasetofcomputersconnectedonanetwork,youhaveavastpoolofCPUs,andyouoftenhavetheabilitytoreadandwriteverylargefiles(assumingadistributedfilesystemisalsoinplace)Howtodoit?•Thefirststepinbuildingaparallelprogramisidentifyingsetsoftasksthatcanrunconcurrentlyand/orpartitionsofdatathatcanbeprocessedconcurrently.Howtodoit?•Sometimesit'sjustnotpossible.ConsideraFibonaccifunction:Fk+2=Fk+Fk+1•Afunctiontocomputethisbasedontheformabove,cannotbeparallelizedbecauseeachcomputedvalueisdependentonpreviouslycomputedvalues.Howtodoit?•Acommonsituationishavingalargeamountofconsistentdatawhichmustbeprocessed.•Ifthedatacanbedecomposedintoequal-sizepartitions,wecandeviseaparallelsolution.•Considerahugearraywhichcanbebrokenupintosub-arrays….Howtodoit?Ifthesameprocessingisrequiredforeacharrayelement,withnodependenciesinthecomputations,andnocommunicationrequiredbetweentasks,wehaveanidealparallelcomputingopportunity.ImplementationStrategy:Master/Worker•TheMASTER:–initializesthearrayandsplitsitupaccordingtothenumberofavailableWORKERS–sendseachWORKERitssubarray–receivestheresultsfromeachWORKER•TheWORKER:–receivesthesubarrayfromtheMASTER–performsprocessingonthesubarray–returnsresultstoMASTERLoadBalancing•TheMaster/Workermodelimplementsstaticloadbalancingwhichiscommonlyusedifalltasksareperformingthesameamountofworkonidenticalmachines.•Loadbalancingreferstotechniqueswhichtrytospreadtasksamongtheprocessorsinaparallelsystemtoavoidsomeprocessorsbeingidlewhileothershavetasksqueueingupforexecution.LoadBalancing•Astaticloadbalancerallocatesprocessestoprocessorsatruntimewhiletakingnoaccountofcurrentnetworkload.•Dynamicalgorithmsaremoreflexible,thoughmorecomputationallyexpensive,andgivesomeconsiderationtothenetworkloadbeforeallocatingthenewprocesstoaprocessor.ClassicExample1•Consideroneofthemethodsforapproximatingπ.Thefirststepistoinscribeacircleinsideasquare:SomeCalculations•Areaofthesquare:As=(2r)2or4r2•Areaofthecircle:Ac=π*r2•So:pi=Ac/r2As=4r2r2=As/4pi=4*Ac/As“Parallelizing…”1.Randomlygeneratepointsinthesquare2.Countthenumberofgeneratedpointsthatarebothinthecircleandinthesquare3.r=thenumberofpointsinthecircledividedbythenumberofpointsinthesquare4.π=4*r“Parallelizing…”NUMPOINTS=100000;//somelargenumber-thebigger,theclosertheapproximationp=numberofWORKERS;numPerWorker=NUMPOINTS/p;countCircle=0;//oneoftheseforeachWORKER//eachWORKERdoesthefollowing:for(i=0;inumPerWorker;i++){generate2randomnumbersthatlieinsidethesquare;xcoord=firstrandomnumber;ycoord=secondrandomnumber;if(xcoord,ycoord)liesinsidethecirclecountCircle++;}MASTER:receivesfromWORKERStheircountCirclevaluescomputesPIfromthesevalues:PI=4.0*countCircle/NUMPOINTS;ClassicExample2:MapReduce•TheMapReduceprogrammingmodelderivesfromthemapandreducecombinatorsfromafunctionallanguagelikeLisp.•InLisp,amaptakesasinputafunctionandasequenceofvalues.Itthenappliesthefunctiontoeachvalueinthesequence.•Areducecombinesalltheelementsofasequenceusingabinaryoperation.Forexample,itcanuse+toaddupalltheelementsinthesequence.ClassicExample2:MapReduce•MapReduceisinspiredbytheseconcepts.•ItdevelopedwithinGoogleasamechanismforprocessinglargeamountsofrawdata,forexample,crawleddocumentsorwebrequestlogs.•Thisdataissolarge,itmustbedistributedacrossthousandsofmachinesinordertobeprocessedinareasonabletime.ClassicExample2:MapReduce•ThisdistributionimpliesparallelcomputingsincethesamecomputationsareperformedoneachCPU,butwithadifferentdataset.•MapReduceisanabstractionthatallowsGoogleengineerstoperformsimplecomputationswhilehidingthedetailsofparallelization,datadistribution,loadbalancingandfaulttolerance.Map•Map,writtenbyauseroftheMapReducelibrary,takesaninputpairandproducesasetofintermediatekey/valuepairs.•TheMapReducelibrarygroupstogetherallintermediatevaluesassociatedwiththesameintermediatekeyIandpassesthemtothereducefunction.Reduce•Thereducefunction,alsowrittenbytheuser,acceptsanintermediatekeyIandasetofvaluesforthatkey.Itmergestogetherthesevaluestoformapossiblysmallersetofvalues.Example•Considertheproblemofcountingthenumberofoccurrencesofeachwordinalargecollectionofdocuments:map(Stringkey,Stringvalue)://key:documentname//value:documentcontentsforeachwor
本文标题:并行计算介绍
链接地址:https://www.777doc.com/doc-2456030 .html