您好,欢迎访问三七文档
当前位置:首页 > 行业资料 > 其它行业文档 > 24_编译器参数调优方法
1®CompilersForXeon™ProcessorAgendaGeneralXeon™processoroptimizationsLoopleveloptimizationsMulti-passoptimizationsOtherAgendaGeneralXeon™processoroptimizationsLoopleveloptimizationsMulti-passoptimizationsOtherGeneralOptimizations/Od,-O0:disableoptimizations/Zi,-g:CreateSymbols/O1,-O1:Optimizesforspeedwithoutincreasingcodesize–i.e.disableslibraryfunctioninlining/O2,-O2–default–Optimizeforspeed/O3,-O3–High-leveloptimizationsAgendaGeneralXeon™processoroptimizationsLoopleveloptimizationsMulti-passoptimizationsOtherInstructionSchedulingScheduleinstructionstobeoptimalforspecificprocessorinstructionlatenciesandcachesizesWindowsLinuxPentium®processorsandPentiumprocessorswithMMX™technology-G5-tpp5PentiumPro,PentiumIIandPentiumIIIprocessors-G6(Default)-tpp6(Default)Pentium4processor-G7-tpp7Note:defaultmaychangeinfuturecompilersShift/MultiplyLatencyPentium–Shifthas~1xlatencyofadds–Multiplyhas~10xlatencyofaddsPentiumPro,II,andIII–Shifthas~1xlatencyofadds–Multiplyhas~3xlatencyofaddsPentium4(maychangeinfuturereleases)–Shifthas~8xlatencyofadds–Multiplyhas~26xlatencyofaddsUndertheCovers:P4Compileraccountsforthesedifferencesforyou!for(inti=0;ilength;i++){p[i]=q[i]*32;}.B1.7:#-tpp6movl(%ebx,%edx,4),%eaxshll$5,%eaxmovl%eax,(%esi,%edx,4)incl%edxcmpl%ecx,%edxjl.B1.7.B1.7:#-tpp7movl(%ebx,%edx,4),%eaxaddl%eax,%eaxaddl%eax,%eaxaddl%eax,%eaxaddl%eax,%eaxaddl%eax,%eaxmovl%eax,(%esi,%edx,4)addl$1,%edxcmpl%ecx,%edxjl.B1.7UndertheCovers:XeonWhichProcessor:[a]x?Torequireatleast...UseWindows*Linux*PentiumProandPentiumIIprocessorswithCMOVandFCMOVinstructionsiQaxiaxiPentiumprocessorswithMMXinstructionsMQaxMaxMPentiumIIIprocessorwithStreamingSIMDExtensions(impliesiandMabove)KQaxKaxKPentium4processorwithStreamingSIMDExtensions2(impliesi,MandKabove)WQaxWaxWAutomaticProcessorDispatchSingleexecutable–Pentium4targetthatrunsonallx86processors.ForTargetProcessorituses:–ProcessorSpecificOpcodes–Prefetch(PentiumIIIonly)–VectorizationLowOverhead–SomeincreaseincodesizeCanmixandmatch:-xK–axWtogethermakesXeon/Pentium4thetargetandPentiumIIIthedefaultAgendaGeneralXeon™processoroptimizationsLoopleveloptimizationsMulti-passoptimizationsOtherVectorizationAutomaticallyconvertsloopstoutilizeMMX/SSE/SSE2instructionsandregisters.Datatypes:char/short/int/float/double–(butnotmixed)CanUseShortVectorMathLibraryEnabledthrough-[Q]xW,-[Q]xK,-[Q]axW,-[Q]axK-vec_report3tellsyouwhichloopswerevectorized,andifnot,whynot.HighLevelOptimizer•Windows:/O3orLinux:-O3•Usewith–xW,-xK,-QxW,-QxK,etc.–additionalloopoptimizations–moreaggressivedependencyanalysis–scalarreplacement–softwareprefetch(-xKonPentiumIII)LoopsmustmeetcriteriarelatedtothoseforvectorizationUndertheCovers:XeonSMPparallelismOpenMP–Easymultithreadingusingdirectives–UseKSLtoolsforDevelopment–UseInteltoolstooptimizeforIAintandemwithOpenMPAuto-parallelization–SimpleloopsthreadedbycompileraloneLoopsmustmeetcertaincriteria…OpenMP*SupportOpenMP1.1forFortran&1.0forC/C++–DebuggerinfosupportforOpenMP–AssureforThreadssupportedwithIntelCompilerOpenMPswitches:–-Qopenmp,-openmp(or-openmpP)–-QopenmpS,-openmpS(serial,fordebugging)–-openmp_report[n](diagnostics)–worksinconjunctionwithvectorizationAutoParallelizationAuto-parallelization:AutomaticthreadingofloopswithouthavingtomanuallyinsertOpenMP*directive.–-Qparallel(Windows*),-parallel(Linux*)–-Qpar_report[n],-par_report[n](diagnostics)BettertouseOpenMPdirectives–Compilercanidentify“easy”candidatesforparallelization,butlargeapplicationsaredifficulttoanalyze.AgendaGeneralandprocessoroptimizationLoopleveloptimizationsMulti-passoptimizations–InterProceduralOptimization–ProfileGuidedOptimizationOtherInter-ProceduralOptimizations(IPO)-Qip,-ip:Enablesinterproceduraloptimizationsforsinglefilecompilation.-Qipo,-ipo:Enablesinterproceduraloptimizationsacrossfiles.Inter-ProceduralOptimizations(IPO)Morebenefitsthanjustinlining–Partialinlining–Interproceduralconstantpropagation–Passingargumentsinregisters–Loop-invariantcodemotion–Deadcodeelimination–Helpsvectorization,memorydisambiguationPass1Pass2virtual.objand.ilfilesexecutableCompiling:Windows*:icl-c/Qipomain.cfunc1.cfunc2.cLinux*:icc-c-ipomain.cfunc1.cfunc2.cLinking:Windows*:icl/Qipomain.objfunc1.objfunc2.objLinux*:icc-ipomain.objfunc1.objfunc2.objIPOUsage:2StepProcessWindows*Hint:LINK=link.exeshouldbereplacedwithLINK=xilink.exeie:xilink/Qipolinkcommandsmain.objfunc1.objfunc2.objUseexecution-timefeedbacktoguideoptHelpsI-cache,paging,branch-predictionEnabledOptimizations:–Basicblockordering–Betterregisterallocation–Betterdecisionoffunctionstoinline–Functionordering–Switch-statementoptimization–BettervectorizationdecisionsProfile-GuidedOptimizations(PGO)InstrumentedCompilationWindows:icl/Qprof_genprog.cLinux:icc-prof_genprog.cInstrumentedExecutionprog.exe(onatypicaldataset)FeedbackCompilationWindows:icl/Qprof_useprog.cLinux:icc-prof_useprog.cDYNfilecontainingdynamicinfo:.dynInstrumentedExecutable:prog.exeMergedDYNSummaryFile:.dpiDe
本文标题:24_编译器参数调优方法
链接地址:https://www.777doc.com/doc-3357001 .html