计算机体系结构课后习题

计算机体系结构课后习题1.1Threeenhancementswiththefollowingspeedupsareproposedforanewarchitecture:Speedup1=30Speedup2=20Speedup3=15Onlyoneenhancementisusableatatime.（1）Ifenhancements1and2areeachusablefor25%ofthetime,whatfractionofthetimemustenhancement3beusedtoachieveanoverallspeedupof10?（2）Assumetheenhancementscanbeused25%,35%and10%ofthetimeforenhancements1,2,and3,respectively.Forwhatfractionofthereducedexecutiontimeisnoenhancementinuse?（3）Assume,forsomebenchmark,thepossiblefractionofuseis15%foreachofenhancements1and2and70%forenhancement3.Wewanttomaximizeperformance.Ifonlyoneenhancementcanbeimplemented,whichshoulditbe?Iftwoenhancementscanbeimplemented,whichshouldbechosen?答：(1)Assume:thefractionofthetimeenhancement3mustbeusedtoachieveanoverallspeedupof10isx.𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙=1(1−𝐹𝑟𝑎𝑐𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑)+𝐹𝑟𝑎𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑10=1(1−25%−25%−𝑥)+25%30+25%20+𝑥15So,x=45%(2)Assume:ThetotalexecutiontimebeforethethreeenhancementscanbeusedisTimebefore，TheexecutiontimefornoenhancementisTimeno.𝑇𝑖𝑚𝑒𝑛𝑜=(1−25%−35%−10%)×𝑇𝑖𝑚𝑒𝑏𝑒𝑓𝑜𝑟𝑒ThetotalexecutiontimeafterthethreeenhancementscanbeusedisTimeafter𝑇𝑖𝑚𝑒𝑎𝑓𝑡𝑒𝑟=𝑇𝑖𝑚𝑒𝑛𝑜+25%30×𝑇𝑖𝑚𝑒𝑏𝑒𝑓𝑜𝑟𝑒+35%20×𝑇𝑖𝑚𝑒𝑏𝑒𝑓𝑜𝑟𝑒+10%15×𝑇𝑖𝑚𝑒𝑏𝑒𝑓𝑜𝑟𝑒So，𝑇𝑖𝑚𝑒𝑛𝑜𝑇𝑖𝑚𝑒𝑎𝑓𝑡𝑒𝑟=90.2%(3)By𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙=1(1−𝐹𝑟𝑎𝑐𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑)+𝐹𝑟𝑎𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑Ifonlyoneenhancementcanbeimplemented：𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙1=1(1−15%)+15%30=1.17𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙2=1(1−15%)+15%20=1.166𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙3=1(1−15%)+15%15=2.88So，wemustselectenhancement1and3tomaximizeperformance.𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙=1(1−𝐹𝑟𝑎𝑐𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑)+𝐹𝑟𝑎𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙12=1(1−15%−15%)+15%30+15%20=1.40𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙13=1(1−15%−70%)+15%30+70%15=4.96𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙23=1(1−15%−70%)+15%20+70%15=4.90So，wemustselectenhancement1and3tomaximizeperformance.1.2Supposethereisagraphicsoperationthataccountsfor10%ofexecutiontimeinanapplication,andbyaddingspecialhardwarewecanspeedthisupbyafactorof18.Infurther,wecouldusetwiceasmuchhardware,andmakethegraphicsoperationrun36timesfaster.Givethereasonofwhetheritisworthexploringsuchanfurtherarchitecturalchange?答：𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙=1(1−𝐹𝑟𝑎𝑐𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑)+𝐹𝑟𝑎𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙1=1(1−10%)+10%18=10.9+0.0055555=1.104𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙2=1(1−10%)+10%36=10.9+0.0027777=1.108So，Itisnotworthexploringsuchanfurtherarchitecturalchange.1.3Inmanypracticalapplicationsthatdemandareal-timeresponse,thecomputationalworkloadWisoftenfixed.Asthenumberofprocessorsincreasesinaparallelcomputer,thefixedworkloadisdistributedtomoreprocessorsforparallelexecution.Assume20percentofWmustbeexecutedsequentially,and80percentcanbeexecutedby4nodessimultaneously.Whatisafixed-loadspeedup?答：𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙=1(1−𝐹𝑟𝑎𝑐𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑)+𝐹𝑟𝑎𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙1=𝑊𝑊×20%+𝑊×80%4=10.2+0.2=2.5So，afixed-loadspeedupis2.5.2.1Thereisamodelmachinewithnineinstructions,whichfrequenciesareADD(0.3),SUB(0.24),JOM(0.06),STO(0.07),JMP(0.07),SHR(0.02),CIL(0.03),CLA(0.2),STP(0.01),respectively.ThereareseveralGPRsinthemachine.Memoryisbyteaddressable,withaccessedaddressesaligned.Andthememorywordwidthis16bit.Supposethenineinstructionswiththecharacteristicsasfollowing:perandsformat:R(register)-R(register)-M(memory)A.EncodethenineinstructionswithHuffman-coding,andgivetheaveragecodelength.B.Designedthepracticalinstructioncodes,andgivetheaveragecodelength.C.Writethetwoinstructionwordformatsindetail.D.Whatisthemaximumoffsetforaccessingmemoryaddress?答：HuffmancodingbyHuffmantree30%0124%1120%106%00017%00117%00102%0000013%000011%000000So，theaveragecodelengthis∑𝑝𝑖×𝑙𝑖=2.61bits9𝑖=1（B）Twokindsofinstructionlengthextendedcoding30%0124%1120%106%110007%110017%110102%110113%111001%11101So，theaveragecodelengthis（C）Shorterinstructionformat:Opcode2bitsRegister3bitsRegister3bitsLongerinstructionformat:opcode5bitsRegister3bitsRegister3bitsoffset5bits（D）Themaximumoffsetforaccessingmemoryaddressis32bytes.3.1Identifyallofthedatadependencesinthefollowingcode.Whichdependencesaredatahazardsthatwillberesolvedviaforwarding?ADDR2,R5,R4ADDR4,R2,R5SWR5,100(R2)ADDR3,R2,R4答：3.2Howcouldwemodifythefollowingcodetomakeuseofadelayedbranchslot?Loop:LWR2,100(R3)ADDIR3,R3,#4BEQR3,R4,Loop答：LWR2,100(R3)Loop:ADDIR3,R3,#4BEQR3,R4,LoopDelayedbranchslotLWR2,100(R3)3.3Considerthefollowingreservationtableforafour-stagepipelinewithaclockcyclet=20ns.A.Whataretheforbiddenlatenciesandtheinitialcollisionvector?B.Drawthestatetransitiondiagramforschedulingthepipeline.C.DeterminetheMALassociatedwiththeshortestgreedycycle.D.DeterminethepipelinemaximumthroughputcorrespondingtotheMALandgivent.123456×××××××s1s2s3s4答：A.theforbiddenlatenciesF={1,2,5}theinitialcollisionvectorC=(1001

计算机体系结构课后习题

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

中国国家博物馆改扩建工程施工组织设计

旧人教版高中生物实验

中国联通企业商务集成通信（EASYB）解决方案交流（PPT 97页）

原发性骨质疏松症患者生活质量评价及其影响因素研究

广告策划书593112825

第三章物流管理基本原理第三章物流管理基本原理

3、运营篇3G策略培训

市场部岗位职责及管理要求(1)

业务统计报表（周、月）

工程财务第四章

相关文档

相关搜索