您好,欢迎访问三七文档
当前位置:首页 > 金融/证券 > 综合/其它 > 浙江大学多核计算课件06
OpenMP:APortableSolutionforThreadingZJU-IntelEmbeddedTechnologyCenter*?CompilerdirectivesformultithreadedprogrammingEasytocreatethreadedFortranandC/C++codesSupportsdataparallelismmodelIncrementalparallelismCombinesserialandparallelcodeinsinglesourceZJU-IntelEmbeddedTechnologyCenter*?omp_set_lock(lck)#pragmaompparallelforprivate(A,B)#pragmaompcriticalC$OMPparalleldoshared(a,b,c)C$OMPPARALLELREDUCTION(+:A,B)callOMP_INIT_LOCK(ilok)callomp_test_lock(jlok)setenvOMP_SCHEDULE“dynamic”CALLOMP_SET_NUM_THREADS(10)C$OMPDOlastprivate(XX)C$OMPORDEREDC$OMPSINGLEPRIVATE(X)C$OMPSECTIONSC$OMPMASTERC$OMPATOMICC$OMPFLUSHC$OMPPARALLELDOORDEREDPRIVATE(A,B,C)C$OMPTHREADPRIVATE(/ABC/)C$OMPPARALLELCOPYIN(/blk/)Nthrds=OMP_GET_NUM_PROCS()!$OMPBARRIER(combinedC/C++andFortran)ZJU-IntelEmbeddedTechnologyCenter*ArchitectureFork-joinmodelWork-sharingconstructsDataenvironmentconstructsSynchronizationconstructsExtensiveApplicationProgramInterface(API)forfinercontrolZJU-IntelEmbeddedTechnologyCenter•Masterthreadspawnsateamofthreadsasneeded•Parallelismisaddedincrementally:thesequentialprogramevolvesintoaparallelprogramParallelRegionsMasterThreadZJU-IntelEmbeddedTechnologyCenter*PragmaSyntaxMostconstructsinOpenMP*arecompilerdirectivesorpragmas.ForCandC++,thepragmastaketheform:#pragmaompconstruct[clause[clause]…]ZJU-IntelEmbeddedTechnologyCenterDefinesparallelregionoverstructuredblockofcodeThreadsarecreatedas‘parallel’pragmaiscrossedThreadsblockatendofregionDataissharedamongthreadsunlessspecifiedotherwise#pragmaompparallelThread1Thread2Thread3C/C++:#pragmaompparallel{block}ZJU-IntelEmbeddedTechnologyCenterSetenvironmentvariablefornumberofthreadssetOMP_NUM_THREADS=4ThereisnostandarddefaultforthisvariableManysystems:#ofthreads=#ofprocessorsIntel®compilersusethisdefaultZJU-IntelEmbeddedTechnologyCenterSplitsloopiterationsintothreadsMustbeintheparallelregionMustprecedetheloop#pragmaompparallel#pragmaompforfor(I=0;IN;I++){Do_Work(I);}ZJU-IntelEmbeddedTechnologyCenterThreadsareassignedanindependentsetofiterationsThreadsmustwaitattheendofwork-sharingconstruct#pragmaompparallel#pragmaompforImplicitbarrieri=1i=2i=3i=4i=5i=6i=7i=8i=9i=10i=11i=12#pragmaompparallel#pragmaompforfor(i=1,i13,i++)c[i]=a[i]+b[i]ZJU-IntelEmbeddedTechnologyCenterThesetwocodesegmentsareequivalent#pragmaompparallel{#pragmaompforfor(i=0;iMAX;i++){res[i]=huge();}}#pragmaompparallelforfor(i=0;iMAX;i++){res[i]=huge();}ZJU-IntelEmbeddedTechnologyCenterOpenMPusesashared-memoryprogrammingmodelMostvariablesaresharedbydefault.GlobalvariablesaresharedamongthreadsC/C++:Filescopevariables,staticZJU-IntelEmbeddedTechnologyCenterBut,noteverythingisshared...StackvariablesinfunctionscalledfromparallelregionsarePRIVATEAutomaticvariableswithinastatementblockarePRIVATELoopindexvariablesareprivate(withexceptions)C/C+:Thefirstloopindexvariableinnestedloopsfollowinga#pragmaompforZJU-IntelEmbeddedTechnologyCenterThedefaultstatuscanbemodifiedwithdefault(shared|none)Scopingattributeclausesshared(varname,…)private(varname,…)ZJU-IntelEmbeddedTechnologyCenterReproducesthevariableforeachthreadVariablesareun-initialized;C++objectisdefaultconstructedAnyvalueexternaltotheparallelregionisundefinedvoid*work(float*c,intN){floatx,y;inti;#pragmaompparallelforprivate(x,y)for(i=0;iN;i++){x=a[i];y=b[i];c[i]=x+y;}}ZJU-IntelEmbeddedTechnologyCenterMustprotectaccesstoshared,modifiabledatafloatdot_prod(float*a,float*b,intN){floatsum=0.0;#pragmaompparallelforshared(sum)for(inti=0;iN;i++){#pragmaompcriticalsum+=a[i]*b[i];}returnsum;}ZJU-IntelEmbeddedTechnologyCenter#pragmaompcritical[(lock_name)]DefinesacriticalregiononastructuredblockOpenMP*CriticalConstructfloatRES;#pragmaompparallel{floatB;#pragmaompforfor(inti=0;initers;i++){B=big_job(i);#pragmaompcritical(RES_lock)consum(B,RES);}}Threadswaittheirturn–atatime,onlyonecallsconsum()therebyprotectingRESfromraceconditionsNamingthecriticalconstructRES_lockisoptionalZJU-IntelEmbeddedTechnologyCenter*ReductionClausereduction(op:list)Thevariablesin“list”mustbesharedintheenclosingparallelregionInsideparallelorwork-sharingconstruct:APRIVATEcopyofeachlistvariableiscreatedandinitializeddependingonthe“op”ThesecopiesareupdatedlocallybythreadsAtendofconstruct,localcopiesarecombinedthrough“op”intoasin
本文标题:浙江大学多核计算课件06
链接地址:https://www.777doc.com/doc-3103728 .html