您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 经营企划 > 计算机体系结构课后习题
计算机体系结构课后习题1.1Threeenhancementswiththefollowingspeedupsareproposedforanewarchitecture:Speedup1=30Speedup2=20Speedup3=15Onlyoneenhancementisusableatatime.(1)Ifenhancements1and2areeachusablefor25%ofthetime,whatfractionofthetimemustenhancement3beusedtoachieveanoverallspeedupof10?(2)Assumetheenhancementscanbeused25%,35%and10%ofthetimeforenhancements1,2,and3,respectively.Forwhatfractionofthereducedexecutiontimeisnoenhancementinuse?(3)Assume,forsomebenchmark,thepossiblefractionofuseis15%foreachofenhancements1and2and70%forenhancement3.Wewanttomaximizeperformance.Ifonlyoneenhancementcanbeimplemented,whichshoulditbe?Iftwoenhancementscanbeimplemented,whichshouldbechosen?答:(1)Assume:thefractionofthetimeenhancement3mustbeusedtoachieveanoverallspeedupof10isx.𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙=1(1−𝐹𝑟𝑎𝑐𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑)+𝐹𝑟𝑎𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑10=1(1−25%−25%−𝑥)+25%30+25%20+𝑥15So,x=45%(2)Assume:ThetotalexecutiontimebeforethethreeenhancementscanbeusedisTimebefore,TheexecutiontimefornoenhancementisTimeno.𝑇𝑖𝑚𝑒𝑛𝑜=(1−25%−35%−10%)×𝑇𝑖𝑚𝑒𝑏𝑒𝑓𝑜𝑟𝑒ThetotalexecutiontimeafterthethreeenhancementscanbeusedisTimeafter𝑇𝑖𝑚𝑒𝑎𝑓𝑡𝑒𝑟=𝑇𝑖𝑚𝑒𝑛𝑜+25%30×𝑇𝑖𝑚𝑒𝑏𝑒𝑓𝑜𝑟𝑒+35%20×𝑇𝑖𝑚𝑒𝑏𝑒𝑓𝑜𝑟𝑒+10%15×𝑇𝑖𝑚𝑒𝑏𝑒𝑓𝑜𝑟𝑒So,𝑇𝑖𝑚𝑒𝑛𝑜𝑇𝑖𝑚𝑒𝑎𝑓𝑡𝑒𝑟=90.2%(3)By𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙=1(1−𝐹𝑟𝑎𝑐𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑)+𝐹𝑟𝑎𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑Ifonlyoneenhancementcanbeimplemented:𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙1=1(1−15%)+15%30=1.17𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙2=1(1−15%)+15%20=1.166𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙3=1(1−15%)+15%15=2.88So,wemustselectenhancement1and3tomaximizeperformance.𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙=1(1−𝐹𝑟𝑎𝑐𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑)+𝐹𝑟𝑎𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙12=1(1−15%−15%)+15%30+15%20=1.40𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙13=1(1−15%−70%)+15%30+70%15=4.96𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙23=1(1−15%−70%)+15%20+70%15=4.90So,wemustselectenhancement1and3tomaximizeperformance.1.2Supposethereisagraphicsoperationthataccountsfor10%ofexecutiontimeinanapplication,andbyaddingspecialhardwarewecanspeedthisupbyafactorof18.Infurther,wecouldusetwiceasmuchhardware,andmakethegraphicsoperationrun36timesfaster.Givethereasonofwhetheritisworthexploringsuchanfurtherarchitecturalchange?答:𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙=1(1−𝐹𝑟𝑎𝑐𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑)+𝐹𝑟𝑎𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙1=1(1−10%)+10%18=10.9+0.0055555=1.104𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙2=1(1−10%)+10%36=10.9+0.0027777=1.108So,Itisnotworthexploringsuchanfurtherarchitecturalchange.1.3Inmanypracticalapplicationsthatdemandareal-timeresponse,thecomputationalworkloadWisoftenfixed.Asthenumberofprocessorsincreasesinaparallelcomputer,thefixedworkloadisdistributedtomoreprocessorsforparallelexecution.Assume20percentofWmustbeexecutedsequentially,and80percentcanbeexecutedby4nodessimultaneously.Whatisafixed-loadspeedup?答:𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙=1(1−𝐹𝑟𝑎𝑐𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑)+𝐹𝑟𝑎𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙1=𝑊𝑊×20%+𝑊×80%4=10.2+0.2=2.5So,afixed-loadspeedupis2.5.2.1Thereisamodelmachinewithnineinstructions,whichfrequenciesareADD(0.3),SUB(0.24),JOM(0.06),STO(0.07),JMP(0.07),SHR(0.02),CIL(0.03),CLA(0.2),STP(0.01),respectively.ThereareseveralGPRsinthemachine.Memoryisbyteaddressable,withaccessedaddressesaligned.Andthememorywordwidthis16bit.Supposethenineinstructionswiththecharacteristicsasfollowing:perandsformat:R(register)-R(register)-M(memory)A.EncodethenineinstructionswithHuffman-coding,andgivetheaveragecodelength.B.Designedthepracticalinstructioncodes,andgivetheaveragecodelength.C.Writethetwoinstructionwordformatsindetail.D.Whatisthemaximumoffsetforaccessingmemoryaddress?答:HuffmancodingbyHuffmantree30%0124%1120%106%00017%00117%00102%0000013%000011%000000So,theaveragecodelengthis∑𝑝𝑖×𝑙𝑖=2.61bits9𝑖=1(B)Twokindsofinstructionlengthextendedcoding30%0124%1120%106%110007%110017%110102%110113%111001%11101So,theaveragecodelengthis(C)Shorterinstructionformat:Opcode2bitsRegister3bitsRegister3bitsLongerinstructionformat:opcode5bitsRegister3bitsRegister3bitsoffset5bits(D)Themaximumoffsetforaccessingmemoryaddressis32bytes.3.1Identifyallofthedatadependencesinthefollowingcode.Whichdependencesaredatahazardsthatwillberesolvedviaforwarding?ADDR2,R5,R4ADDR4,R2,R5SWR5,100(R2)ADDR3,R2,R4答:3.2Howcouldwemodifythefollowingcodetomakeuseofadelayedbranchslot?Loop:LWR2,100(R3)ADDIR3,R3,#4BEQR3,R4,Loop答:LWR2,100(R3)Loop:ADDIR3,R3,#4BEQR3,R4,LoopDelayedbranchslotLWR2,100(R3)3.3Considerthefollowingreservationtableforafour-stagepipelinewithaclockcyclet=20ns.A.Whataretheforbiddenlatenciesandtheinitialcollisionvector?B.Drawthestatetransitiondiagramforschedulingthepipeline.C.DeterminetheMALassociatedwiththeshortestgreedycycle.D.DeterminethepipelinemaximumthroughputcorrespondingtotheMALandgivent.123456×××××××s1s2s3s4答:A.theforbiddenlatenciesF={1,2,5}theinitialcollisionvectorC=(1001
本文标题:计算机体系结构课后习题
链接地址:https://www.777doc.com/doc-4944197 .html