您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 资本运营 > Layered Learning in Multi-Agent Systems
LayeredLearninginMulti-AgentSystemsPeterStoneDecember15,1998CMU-CS-98-187SchoolofComputerScienceCarnegieMellonUniversityPittsburgh,PA15213-3891Submittedinpartialful llmentoftherequirementsforthedegreeofDoctorofPhilosophyThesisCommittee:ManuelaM.Veloso,ChairAndrewW.MooreHerbertA.SimonVictorR.Lesser(UniversityofMassachusetts,Amherst)Copyrightc 1998PeterStoneTheworkhasbeensupportedthroughthegenerosityoftheNASAGraduateStudentResearchPro-gram(GSRP).ThisresearchisalsosponsoredinpartbytheDefenseAdvancedResearchProjectsAgency(DARPA),andRomeLaboratory,AirForceMaterielCommand,USAF,underagreementnumbersF30602-95-1-0018,F30602-97-2-0250andF30602-98-2-0135andinpartbytheDepartmentoftheNavy,O ceofNavalResearchundercontractnumberN00014-95-1-0591.Viewsandconclusionscontainedinthisdocu-mentarethoseoftheauthorsandshouldnotbeinterpretedasnecessarilyrepresentingtheo cialpoliciesorendorsements,eitherexpressedorimplied,ofNASA,theDefenseAdvancedResearchProjectsAgency(DARPA),theAirForceResearchLaboratory(AFRL),theDepartmentoftheNavy,O ceofNavalRe-search,ortheU.S.Government.Keywords:Multi-agentsystems,machinelearning,multi-agentlearning,controllearning,hierarchicallearning,reinforcementlearning,decisiontreelearning,neuralnetworks,roboticsoccer,networkroutingAbstractMulti-agentsystemsincomplex,real-timedomainsrequireagentstoacte ectivelybothau-tonomouslyandaspartofateam.Thisdissertationaddressesmulti-agentsystemsconsistingofteamsofautonomousagentsactinginreal-time,noisy,collaborative,andadversarialenvi-ronments.Becauseoftheinherentcomplexityofthistypeofmulti-agentsystem,thisthesisinvestigatestheuseofmachinelearningwithinmulti-agentsystems.Thedissertationmakesfourmaincontributionstothe eldsofMachineLearningandMulti-AgentSystems.First,thethesisde nesateammemberagentarchitecturewithinwhicha exibleteamstructureispresented,allowingagentstodecomposethetaskspaceinto exiblerolesandallowingthemtosmoothlyswitchroleswhileacting.Teamorganizationisachievedbytheintroductionofalocker-roomagreementasacollectionofconventionsfollowedbyallteammembers.Itde nesagentroles,teamformations,andpre-compiledmulti-agentplans.Inaddition,theteammemberagentarchitectureincludesacommunicationparadigmfordomainswithsingle-channel,low-bandwidth,unreliablecommunication.Thecommunica-tionparadigmfacilitatesteamcoordinationwhilebeingrobusttolostmessagesandactiveinterferencefromopponents.Second,thethesisintroduceslayeredlearning,ageneral-purposemachinelearningparadigmforcomplexdomainsinwhichlearningamappingdirectlyfromagents’sensorstotheiractuatorsisintractable.Givenahierarchicaltaskdecomposition,layeredlearningallowsforlearningateachlevelofthehierarchy,withlearningateachleveldirectlya ectinglearningatthenexthigherlevel.Third,thethesisintroducesanewmulti-agentreinforcementlearningalgorithm,namelyteam-partitioned,opaque-transitionreinforcementlearning(TPOT-RL).TPOT-RLisde-signedfordomainsinwhichagentscannotnecessarilyobservethestatechangeswhenotherteammembersact.Itexploitslocal,action-dependentfeaturestoaggressivelygeneralizeitsinputrepresentationforlearningandpartitionsthetaskamongtheagents,allowingthemtosimultaneouslylearncollaborativepoliciesbyobservingthelong-terme ectsoftheiractions.Fourth,thethesiscontributesafullyfunctioningmulti-agentsystemthatincorporateslearninginareal-time,noisydomainwithteammatesandadversaries.Detailedalgorithmicdescriptionsoftheagents’behaviorsaswellastheirsourcecodeareincludedinthethesis.Empiricalresultsvalidateallfourcontributionswithinthesimulatedroboticsoccerdo-main.Thegeneralityofthecontributionsisveri edbyapplyingthemtotherealroboticsoccer,andnetworkroutingdomains.Ultimately,thisdissertationdemonstratesthatbylearningportionsoftheircognitiveprocesses,selectivelycommunicating,andcoordinatingtheirbehaviorsviacommonknowledge,agroupofindependentagentscanworktowardsacommongoalinacomplex,real-time,noisy,collaborative,andadversarialenvironment.34AcknowledgementsIwouldliketothankmanypeoplefortheirsupport,encouragementandguidanceduringmyyearsasagraduatestudenthereatCMU.Firstandforemost,thisdissertationrepresentsagreatdealoftimeande ortnotonlyonmypart,butonthepartofmyadvisor,ManuelaVeloso.Shehashelpedmeshapemyresearchfromdayone,pushedmetogetthroughtheinevitableresearchsetbacks,andencouragedmetoachievetothebestofmyability.WithoutManuela,thisdissertationwouldnothavehappened.Ialsothankmyotherthreecommitteemembers,AndrewMoore,HerbSimon,andVictorLesserforvaluablediscussionsandcommentsregardingmyresearch.Almostallresearchinvolvingrobotsisagroupe ort.ThemembersoftheCMUro-bosoccerlabhaveallcontributedtomakingmyresearchpossible.SorinAchim,whohasbeenwithourprojectalmostfromthebeginninghastirelesslyexperimentedwithdi erentrobotarchitectures,alwaysmanagingtopullthingstogetherandcreateworkinghardwareintimeforcompetitions.KwunHanwasapartnerinthesoftwaredevelopmentoftheCMUnited-97team,aswellasaninstrumentalhardwaredeveloperforCMUnited-98.MikeBowlingsuccessfullycreatedanewsoftwareapproachfortheCMUnited-98robots.Healsocollaboratedonanearlysimulatoragentimplementation.
本文标题:Layered Learning in Multi-Agent Systems
链接地址:https://www.777doc.com/doc-4354426 .html