您好,欢迎访问三七文档
APolynomial-timeNashEquilibriumAlgorithmforRepeatedGamesMihaelL.LittmanDept.ofComputerSieneRutgersUniversityPisataway,NJ08854-8019USAPeterStoneDept.ofComputerSienesTheUniversityofTexasatAustinAustin,Texas78712-1188USAAbstratWiththeinreasingrelianeongametheoryasafoundationforautionsandeletroniommere,eÆientalgorithmsforomputingequilibriainmultiplayergeneral-sumgamesareofgreattheoretialandpratialinterest.Theomputa-tionalomplexityof ndingaNashequilibriumforaone-shotbimatrixgameisawellknownopenproblem.Thispapertreatsarelatedbutdistintproblem,thatof ndingaNashequilibriumforanaverage-payo repeatedbimatrixgame,andpresentsapolynomial-timealgorithm.Ourapproahdrawsonthewellknown\folktheoremfromgametheoryandshowshow nite-stateequilibriumstrategiesanbefoundeÆientlyandexpressedsuintly.Keywords:Repeatedgames,omplexityanalysis,Nashequilibrium,omputationalgametheoryPACS:F.2.mEmailaddresses:mlittmans.rutgers.edu(MihaelL.Littman),pstones.utexas.edu(PeterStone).URLs: mlittman/(MihaelL.Littman),(PeterStone).PreprintsubmittedtoElsevierSiene6May20041IntrodutionTheNashequilibriumisoneofthemostimportantoneptsingametheory,formingthebasisofmuhreentworkinmultiagentdeisionmakingandeletronimarketplaes.Assuh,eÆientlyomputingNashequilibriaisoneofthemostimportantproblemsinomputationalgametheory.Theentralresultofthispaperisapolynomial-timealgorithmforomputingaNashequilibriumforrepeated2-player(bimatrix)games,undertheaverage-payo riterion.ThisresultstandsinontrasttotheproblemofomputingaNashequilibriuminaone-shotgame,theomplexityofwhihremainsanimportantandlong-standingopenproblem[12℄.Theideabehindouralgo-rithmehoesthatofthewellknown\folktheorem[11℄,whihshowshowthenotionofthreatsanstabilizeawiderangeofpayo pro lesinrepeatedgames.Whilethefolktheoremprovidesaonstrutivemethodforidentify-ingNashequilibriainrepeatedgames,theontributionofthispaperistoshowhowthethreatideaanbeusedtoreateanomputationallyeÆientequilibrium- ndingalgorithm.Whiledrawingheavilyonthefolktheorem,ourresultisnotanimmediateorollary.Infat,whiletherearefolktheoremsforn-playerrepeatedgames,ourpolynomial-timealgorithmisonlyvalidforn=2.Intherestofthepaper,weformallydesribetheproblem(Setion2)andouralgorithmforsolvingit(Setion3),andonludewithasetofillustrativeexamples(Setion4).2ProblemStatementArepeatedbimatrixgameisplayedbytwoplayers,1and2,eahwithasetofationhoiesofsizen1andn2,respetively.Thegameisplayedinrounds,withthetwoplayerssimultaneouslymakingahoieofationateahround.IfPlayer1hoosesation1 i1 n1andPlayer2hooses1 i2 n2,theyreeivepayo sofP1i1i2andP2i2i1,respetively1.Inarepeatedgame,playersselettheirations,possiblystohastially,viaastrategy|afuntionofthehistoryoftheirinterations.Theobjetiveofeahplayerinarepeatedgameistoadoptastrategythatmaximizesitsexpetedaveragepayo (limitofthemeansriterion).Apairof1Forleanlinessofnotation,wedeviatefromommonpratieandwritematriessothataplayeralwayshoosestherowofitsownpayo matrix,whiletheopponentalwayshoosestheolumn.2strategiesisaNashequilibriumifeahstrategyisoptimizedwithrespettotheother|neitherplayeranimproveitsaveragepayo byhangingstrategiesunilaterally[9℄.Asarunningexampleinthispaper,weusethewellknownIteratedPrisoner’sDilemmatoillustrateandmotivateouralgorithm.Inthisrepeatedbimatrixgame,oneahround,eahplayeraneitherooperate(Ation1)ordefet(Ation2).Thetwoplayersusethesamepayo matrix,P1=P2=2643051375.OnepairofequilibriumstrategiesinthePrisoner’sDilemmaisforbothplayerstodefetineveryround.Theaveragepayo inthisaseis1forbothplayers.Thesestrategiesareinequilibriumbeauseaplayerfaingan\alwaysdefetopponentwillreeiveapayo ofzeroforeveryroundinwhihitseletstheooperateation;thebestrespondto\alwaysdefetistoalwayshoosedefet.Thispaperonsidersthefollowingomputationalproblem.Givenagamespe-i edbypayo matriesP1andP2,returnapairofstrategiesthatonstitutesaNashequilibriumfortheaverage-payo repeatedbimatrixgame.Therun-ningtimeofthealgorithmshouldbeapolynomialfuntionofthesizeoftheinput.Tofullyspeifytheequilibrium-omputationproblem,wemustbeonreteabouttheinputandoutputrepresentations.Theinputrepresentationisrela-tivelystraightforward.For(p;q)2f(1;2);(2;1)g,thefuntionPpisannp nqmatrix.Toboundthesizeofthenumbersinthesematries,weassumetheyarerationalnumbers,spei edasintegernumeratorandnaturaldenominatorofnomorethankbits.So,therunningtimeofouralgorithmneedstobeapolynomialfuntionofn1,n2,andk.Notethattherepresentationsizeofanintegerisroughlyitslogarithminbasetwoandtherepresentationsizeofarationalnumberisthesumofthesizesofitsnumeratoranddenominator.Apolynomial-sizenumberisonewithrepresentationsizeboundedbyapolynomialfuntionoftheinputsize.Multi-plying,dividing,addingorsubtratingtwopolynomial-sizerationalnumbersproduesapolynomial-sizeresult,asdoessolvingapolynomial-sizesystemoflinearequationsorlinearprogram[14℄.Theoutputofanequilibriumomputationisapairofstrategies.ItiswellknownthateverybimatrixgamehasatleastonepairofstrategiesthatisaNashequilibrium.However,strategiesinrepeatedgamesanbein nitelylargeobjetsmappingtheinterationh
本文标题:A polynomial-time Nash equilibrium algorithm for r
链接地址:https://www.777doc.com/doc-5353683 .html