您好,欢迎访问三七文档
1MonitoringandEvaluationofParallelandDistributedSystemsRichardHofmannUniversityErlangen,IMMDVIIMartensstr.3,D-91058Erlangenphone:++49-9131-85-7026email:rhofmann@informatik.uni-erlangen.deAbstractDuetothecomplexinteractionsbetweenactivitiesinparallelprocesses,thedynamicbehaviorofthesystemcan-notbequantifiedapriori.However,aprofoundknowledgeaboutwhatisgoingoninthesystemisthebasisforbal-ancingtheloadinordertooptimallyutilizethepotentialpowerofsuchaparallelsystem.Monitoringisavaluableaidingettingthenecessaryinsightintothisdynamicbe-haviorofinteractingprocesses.Inthefirstpartofthetutorial,theprinciplesofmeasure-ment-basedperformanceanalysisinparallelanddistributedsystemsarediscussed.Generaltopics,concernedwithhardware,software,andhybridmonitoringarepresentedwithexamples,andrulesaregivenforchoosingtheappro-priatemonitoringtechnique.Asanexample,ZM4,auniver-saldistributedmonitorsystemisintroduced.Thesecondpartofthetutorialdealswithalltasksre-latedtotheprocessofpresentingthemeaningoftracedatatohumanbeings.Traceevaluationcanbeperformedwithstatistics-orientedtoolsthatcomputecommontracestatis-tics,findactivities,andvalidateassertionsonsystembe-havior,aswellasinteractivegraphics-orientedtoolsthatpresentstatetimediagramsordrawcausalitydiagramsbetweenprocesstraces.Allthesetoolswillbeintroducedwithexamplesfrommeasurementsatpracticalparallelanddistributedsystems.1IntroductionTherawcomputingpowerofmoderncomputersisgrowingrapidlywithtime.Oneshouldexpectthatthepowerattheuser’sdisposalgrowswiththesamerate.However,experienceshowsasometimesrisingbutsome-timesfallingamountofpowerthatcanbereallyused.Thereasonsforthisphenomenonaremanifold:Usersexpectamorecomfortableenvironmentthatcostscomputingpower,securitymechanismsalsoaccounttoasignificantpartoftherawprocessorpower.Itisprobablynotpossi-bletogetridoftheseeffects.Anothersourceofwastedprocessorpowercanberemediedbycarefuldesignofsoftwaresystemsontheonehandandthoroughanalysisoftheruntimebehaviorontheotherhand.Whileperformanceanalysisisanim-portantissueinmonoprocessorsystems,thisisanindis-pensabletaskinparallelanddistributedsystems.Thisfactiscausedbythecomplexinteractionsbetweenthediffer-entprogrampartsallcooperatinginordertosolveacommontask.Necessarilyusingsharedresourcescausesproblemswithprocesssynchronization,waitingtimes,deadlocksandthelike.Beyondmerelyfunctionalproblemsthisdif-ficultyinmanagingparalleltasks,thereisahighprob-abilityofwastingprocessorpower,i.e.notexploitingtheprocessorpoweratasufficientlyhighlevel.Thistutorialpaperfirstdealswiththebasicproblemsinparallelanddistributedsystemsinordertoprepareacommonknowledgeaboutthereasonsofsuchapowerloss.Ingeneral,thistopiccanbetreatedbyregardingcausalrelationshipsbetweeneventsondifferentcooper-atingprocessors.Inthesecondpartanintroductionintomonitoringofparallelanddistributedsystemswillbepre-sented.Itwillbeshown,howdifferentmonitoringap-proachescanbedesignedbysystemsprogrammersaswellasbyusersofaparallelanddistributedsystem.Parallelanddistributedsystemsrequireamonitoringfacilitythatisabletocopewithalargernumberofproc-essorsaswellaswithspatialdistribution.Forthisreason,ZM4,amonitorsystemthatisbeingusedformanyproj-ectswillbeintroducedasanexampleforstructuringandusingauniversaldistributedmonitorsystem.Usingevent-basedmonitoringtypicallyyieldslargetraceseveniftheeventsarechosencarefully.Inordertoconcentrateworkonpromisingpartsoftheeventtrace,itisnecessarytopointoutthelocationoftheproblem.Therefore,statisticalmethodsareusedforaquickover-viewandacoarseanalysis.Withthatinsight,morede-tailedmethodscanbeapplied.Theircommongoalisnotonlytohaveameasurefortheperformanceofthesesys-temsinpartorintotal,butalsotogetinsightintothedy-namicbehavioroftheprocessesinteracting.Themostimportantmethodsforvisualizingthedy-namicbehaviorofparallelprocessesaretimestatedia-grams(ganttcharts)andcausalitydiagrams(hassedia-grams).Eachofthesemethodsisdiscussedwithanex-ampleintheremainderofthispaper.Basedonthisinfor-mation,thesystemcanbereprogrammedinordertoim-proveitsperformance.22PerformanceProblemsinP&DSystemsTuningprogramsforsingleprocessormachinesisfairlyeasy:useprofilingforfindingoutthosepartsoftheprogramthatareusedpredominantly.Typically,thisisonlyasmallfractionofthewholecode.Rewritingthesepartsoftheprogramyieldahigherperformance.Thissimpleconclusiondoesnotholdforprogramsrunningonparallelanddistributedsystems.Forexample,tuningapartofaprogramthathastowaitforaninterme-diateresultfromanotherprocesswillnotprofitfromthistuning—itsimplyhastowaitforalongertime.Inordertodeterminewhyaprogrambehavesthewayitdoes,thereasonforthisbehaviormustbesought.Thisleadstoregardingcausalityincomputersystems.Ascanbeseeninalatersection,analyzingparallelanddistrib-utedsystemsfromacausalitypointofviewcanleadtointerestingresults.2.1CausalityandComputerSystemsGenerally,thetermcausalitydenotesalaw,whereaspecificactionalwaysleadstothesamespecificresult.Adaptedtocomputersystems,causalitymeans,thatthebehavioroftheirprocessesisruledbythelaws,expressedintheprogram.Here,thefutureofeachprocessdepend
本文标题:1 Monitoring and Evaluation of Parallel and Distri
链接地址:https://www.777doc.com/doc-3324947 .html