您好,欢迎访问三七文档
JournalofArtificialIntelligenceResearch25(2006)119-157Submitted04/05;published02/06LearninginReal-TimeSearch:AUnifyingFrameworkVadimBulitkoBULITKO@UALBERTA.CAGregLeeGREGLEE@CS.UALBERTA.CADepartmentofComputingScienceUniversityofAlbertaEdmonton,AlbertaT6G2E8,CANADAAbstractReal-timesearchmethodsaresuitedfortasksinwhichtheagentisinteractingwithaninitiallyunknownenvironmentinrealtime.Insuchsimultaneousplanningandlearningproblems,theagenthastoselectitsactionsinalimitedamountoftime,whilesensingonlyalocalpartoftheenviron-mentcenteredattheagent’scurrentlocation.Real-timeheuristicsearchagentsselectactionsusingalimitedlookaheadsearchandevaluatingthefrontierstateswithaheuristicfunction.Overre-peatedexperiences,theyrefineheuristicvaluesofstatestoavoidinfiniteloopsandtoconvergetobettersolutions.Thewidespreadofsuchsettingsinautonomoussoftwareandhardwareagentshasledtoanexplosionofreal-timesearchalgorithmsoverthelasttwodecades.Notonlyisapotentialuserconfrontedwithahodgepodgeofalgorithms,buthealsofacesthechoiceofcontrolparameterstheyuse.Inthispaperweaddressbothproblems.Thefirstcontributionisanintroductionofasim-plethree-parameterframework(namedLRTS)whichextractsthecoreideasbehindmanyexistingalgorithms.WethenprovethatLRTA*,-LRTA*,SLA*,andγ-Trapalgorithmsarespecialcasesofourframework.Thus,theyareunifiedandextendedwithadditionalfeatures.Second,weprovecompletenessandconvergenceofanyalgorithmcoveredbytheLRTSframework.Third,weproveseveralupper-boundsrelatingthecontrolparametersandsolutionquality.Finally,weanalyzetheinfluenceofthethreecontrolparametersempiricallyintherealisticscalabledomainsofreal-timenavigationoninitiallyunknownmapsfromacommercialrole-playinggameaswellasroutinginadhocsensornetworks.1.MotivationInthispaper,weconsiderasimultaneousplanningandlearningproblem.Onemotivatingapplica-tionlieswithnavigationonaninitiallyunknownmapunderreal-timeconstraints.Asanexample,considerarobotdrivingtoworkeverymorning.Imaginetherobottobeanewcomertothetown.Thefirstroutetherobotfindsmaynotbeoptimalbecausetrafficjams,roadconditions,andotherfactorsareinitiallyunknown.Withthepassageoftime,therobotcontinuestolearnandeventuallyconvergestoanearlyoptimalcommute.Notethatplanningandlearninghappenwhiletherobotisdrivingandthereforearesubjecttotimeconstraints.Present-daymobilerobotsareoftenplaguedbylocalizationproblemsandpowerlimitations,buttheirsimulationcounter-partsalreadyallowresearcherstofocusontheplanningandlearningproblem.Forinstance,theRoboCupRescuesimulationleague(Kitano,Tadokoro,Noda,Matsub-ara,Takahashi,Shinjou,&Shimada,1999)requiresreal-timeplanningandlearningwithmultipleagentsmappingoutanunknownterrain.Pathfindingisdoneinrealtimeasvariouscrises,involvingfirespreadandhumanvictimstrappedinrubble,progresswhiletheagentsplan.Similarly,manycurrent-generationreal-timestrategygamesemployaprioriknownmaps.FullknowledgeofthemapsenablescompletesearchmethodssuchasA*(Hart,Nilsson,&Raphael,c2006AIAccessFoundation.Allrightsreserved.BULITKO&LEE1968)andDijkstra’salgorithm(Dijkstra,1959).Prioravailabilityofthemapsallowspathfindingenginestopre-computevariousdatatospeedupon-linenavigation.Examplesofsuchdataincludevisibilitygraphs(Woodcock,2000),influencemaps(Pottinger,2000),spacetriangulation(Kall-mann,Bieri,&Thalmann,2003),stateabstractionhierarchies(Holte,Drummond,Perez,Zimmer,&MacDonald,1994;Holte,1996;Botea,M¨uller,&Schaeffer,2004)androutewaypoints(Reece,Krauss,&Dumanoir,2000).However,theforthcominggenerationsofcommercialandacademicgames(Buro,2002)willrequiretheagenttocopewithinitiallyunknownmapsviaexplorationandlearningduringthegame,andthereforewillgreatlylimittheapplicabilityofcompletesearchalgorithmsandpre-computationtechniques.IncrementalsearchmethodssuchasdynamicA*(D*)(Stenz,1995)andD*Lite(Koenig&Likhachev,2002)candealwithinitiallyunknownmapsandarewidelyusedinrobotics,includingDARPA’sUnmannedGroundVehicleprogram,Marsrover,andothermobilerobotprototypes(Her-bert,McLachlan,&Chang,1999;Thayer,Digney,Diaz,Stentz,Nabbe,&Hebert,2000).Theyworkwellwhentherobot’smovementsareslowwithrespecttoitsplanningspeed(Koenig,2004).Inreal-timestrategygames,however,theAIenginecanberesponsibleforhundredstothousandsofagentstraversingamapsimultaneouslyandtheplanningcostbecomesamajorfactor.Toillus-trate:evenatthesmallerscaleofthesix-yearold“AgeofEmpires2”(Ensemble-Studios,1999),60-70%ofsimulationtimeisspentinpathfinding(Pottinger,2000).Thisgivesrisetothefollowingquestions:1.Howcanplanningtimepermove,andparticularlythefirst-movedelay,beminimizedsothateachagentmovessmoothlyandrespondstouserrequestsnearlyinstantly?2.Givenreal-timeexecution,localsensoryinformation,andinitiallyunknownterrain,howcantheagentlearnanear-optimalpathand,atthesametime,minimizethelearningtimeandmemoryrequired?Therestofthepaperisorganizedasfollows.Wefirstintroduceafamilyofreal-timesearchalgo-rithmsdesignedtoaddressthesequestions.Wethenmakethefirstcontributionbydefiningasimpleparameterizedframeworkthatunifiesandextendsseveralpopularreal-timesearchalgorithms.Thesecondcontributionlieswithatheoreticalanalysisoftheresultingframe
本文标题:Learning in real-time search A unifying framework
链接地址:https://www.777doc.com/doc-6269156 .html