您好,欢迎访问三七文档
1AnEfficientProcedureforBuildingtheFunctionalPerformanceModelofaProcessorAlexeyLastovetsky,RaviReddy,RobertHigginsDepartmentofComputerScience,UniversityCollegeDublin,BelfieldDublin4,IrelandE-mail:Alexey.Lastovetsky@ucd.ie,Manumachu.Reddy@ucd.ie,Robert.Higgins@ucd.ieAbstract---Inthispaper,wepresentanefficientprocedureforbuildingapiecewiselinearfunctionapproximationofthespeedfunctionofaprocessorwithhierarchicalmemorystructure.Theproceduretriestominimizetheexperimentaltimeusedforbuildingthespeedfunctionapproximation.WedemonstratetheefficiencyofourprocedurebyperformingexperimentswithamatrixmultiplicationapplicationandaCholeskyFactorizationapplicationthatusememoryhierarchyefficientlyandamatrixmultiplicationapplicationthatusesmemoryhierarchyinefficientlyonalocalnetworkofheterogeneouscomputers.1.IntroductionInourpreviousresearch[1],weaddressedtheproblemofoptimaldistributionorschedulingofcomputationaltasksonnetworksofheterogeneouscomputerswhenoneormoretasksdonotfitintothemainmemoryoftheprocessorsandwhenrelativespeedsofprocessorscannotbeaccuratelyapproximatedbyconstantfunctionsoftheproblemsize.Wedesignedefficientalgorithmstosolvethisschedulingproblemusingaperformancemodelthatintegratessomeoftheessentialfeaturesofaheterogeneousnetworkofcomputers(HNOC)havingamajorimpactontheperformance,suchastheprocessorheterogeneity,theheterogeneityofmemorystructure,andtheeffectsofpaging.Underthismodel,thespeedofeachprocessorisrepresentedbyacontinuousandrelativelysmoothfunctionoftheproblemsize.Thismodelisapplication-centricinthesensethatgenerallyspeakingdifferentapplicationswillcharacterizethespeedoftheprocessorbydifferentfunctions.Actuallyongeneral-purposecommonheterogeneousnetworks,2sizeoftheproblemAbsolutespeed)(1xs)(2xsFigure1.Usingpiecewiselinearapproximationtobuildspeedbandsfor2processors.Thecircularpointsareexperimentallyobtainedwhereasthesquarepointsarecalculatedusingheuristics.Thespeedbandforprocessors1(x)isbuiltfrom3experimentallyobtainedpoints(applicationrunonthisprocessorusesmemoryhierarchyinefficiently)whereasthespeedbands2(x)(applicationrunonthisprocessorusesmemoryhierarchyefficiently)isbuiltfrom4experimentallyobtainedpoints.anintegratedcomputerwillexperienceconstantandstochasticfluctuationsintheworkload.Thischangingtransientloadwillcauseafluctuationinthespeedofthecomputerinthesensethattheexecutiontimeofthesametaskofthesamesizewillvaryfordifferentrunsatdifferenttimes.Thenaturalwaytorepresenttheinherentfluctuationsinthespeedistouseaspeedbandratherthanaspeedfunction.Thewidthofthebandcharacterizestheleveloffluctuationintheperformanceduetochangesinloadovertime.Inourpreviousresearch,wedidnotproposeanymethodstobuildandmaintainthespeedbandofaprocessor.Inthispaper,wepresentanefficientandapracticalprocedureforbuildingapiecewiselinearfunctionapproximationofthespeedbandofaprocessorwithhierarchicalmemorystructure.Thisbandshouldbeabletorepresentanyspeedfunctionoftheprocessor,thatis,anyspeedfunctionrepresentingtheperformanceoftheprocessorshouldfitintothespeedband.Theproceduretriestominimizetheexperimentaltimeusedforbuildingthepiecewiselinearfunction3approximationofthespeedband.Wedonotproposemethodstomaintainthespeedfunctionapproximation.Thisisasubjectofourfutureresearch.Samplepiece-wiselinearfunctionapproximationsofthespeedbandofaprocessorareshowninFigure1.Eachoftheapproximationsisbuiltusingasetoffewexperimentallyobtainedpoints.Themorepointsusedtobuildtheapproximation,themoreaccuratetheapproximationis.Howeveritisprohibitivelyexpensivetouselargenumberofpoints.Henceanoptimalsetoffewpointsneedstobechosentobuildanefficientpiecewiselinearfunctionapproximationofthespeedband.Suchanapproximationgivesthespeedoftheprocessorforanyproblemsizewithcertainaccuracywithintheinherentdeviationoftheperformanceofcomputerstypicallyobservedinthenetwork.Therestofthepaperisorganizedasfollows.Insection2,weformulatetheproblemofbuildingapiecewiselinearfunctionapproximationofaprocessorandpresentanefficientandapracticalproceduretosolvetheproblem.Todemonstratetheefficiencyofourprocedure,weperformexperimentsusingamatrixmultiplicationapplicationandaCholeskyFactorizationapplicationthatusememoryhierarchyefficientlyandamatrixmultiplicationapplicationthatusesmemoryhierarchyinefficientlyonalocalnetworkofheterogeneouscomputers.2.ProcedureforBuildingSpeedFunctionApproximationThissectionisorganizedasfollows.Westartwiththeformulationofthespeedbandapproximationbuildingproblem.Thisisfollowedbyasectiononobtainingtheloadfunctionscharacterizingtheleveloffluctuationinloadovertime.Thethirdsectionpresentstheassumptionsadoptedbyourprocedure.Wethenpresentsomeoperationsandrelationsrelatedtothepiecewiselinearfunctionapproximationofthespeedband.Andfinallyweexplainourproceduretobuildthepiecewiselinearfunctionapproximation.4SizeoftheproblemAbsolutespeedSizeoftheproblemAbsolutespeedreal-lifepiecewise(a)(b)SizeoftheproblemAbsolutespeedx))(,(maxxsx))(,(minxsxSizeoftheproblemAbsolutespeed1x2x3x))(,(1min1xsx))(,(2min2xsx))(,(3min3xsx))(,(1max1xsx))(,(2max2xsx))(,(3max3xsx(c)(d)Figure2.(a)Real-l
本文标题:1 An Efficient Procedure for Building the Function
链接地址:https://www.777doc.com/doc-3376166 .html