A polynomial-time Nash equilibrium algorithm for r

APolynomial-timeNashEquilibriumAlgorithmforRepeatedGamesMihaelL.LittmanDept.ofComputerSieneRutgersUniversityPisataway,NJ08854-8019USAPeterStoneDept.ofComputerSienesTheUniversityofTexasatAustinAustin,Texas78712-1188USAAbstratWiththeinreasingrelianeongametheoryasafoundationforautionsandeletroniommere,eÆientalgorithmsforomputingequilibriainmultiplayergeneral-sumgamesareofgreattheoretialandpratialinterest.Theomputa-tionalomplexityofndingaNashequilibriumforaone-shotbimatrixgameisawellknownopenproblem.Thispapertreatsarelatedbutdistintproblem,thatofndingaNashequilibriumforanaverage-payorepeatedbimatrixgame,andpresentsapolynomial-timealgorithm.Ourapproahdrawsonthewellknown\folktheoremfromgametheoryandshowshownite-stateequilibriumstrategiesanbefoundeÆientlyandexpressedsuintly.Keywords:Repeatedgames,omplexityanalysis,Nashequilibrium,omputationalgametheoryPACS:F.2.mEmailaddresses:mlittmans.rutgers.edu(MihaelL.Littman),pstones.utexas.edu(PeterStone).URLs:mlittman/(MihaelL.Littman),(PeterStone).PreprintsubmittedtoElsevierSiene6May20041IntrodutionTheNashequilibriumisoneofthemostimportantoneptsingametheory,formingthebasisofmuhreentworkinmultiagentdeisionmakingandeletronimarketplaes.Assuh,eÆientlyomputingNashequilibriaisoneofthemostimportantproblemsinomputationalgametheory.Theentralresultofthispaperisapolynomial-timealgorithmforomputingaNashequilibriumforrepeated2-player(bimatrix)games,undertheaverage-payoriterion.ThisresultstandsinontrasttotheproblemofomputingaNashequilibriuminaone-shotgame,theomplexityofwhihremainsanimportantandlong-standingopenproblem[12℄.Theideabehindouralgo-rithmehoesthatofthewellknown\folktheorem[11℄,whihshowshowthenotionofthreatsanstabilizeawiderangeofpayoprolesinrepeatedgames.Whilethefolktheoremprovidesaonstrutivemethodforidentify-ingNashequilibriainrepeatedgames,theontributionofthispaperistoshowhowthethreatideaanbeusedtoreateanomputationallyeÆientequilibrium-ndingalgorithm.Whiledrawingheavilyonthefolktheorem,ourresultisnotanimmediateorollary.Infat,whiletherearefolktheoremsforn-playerrepeatedgames,ourpolynomial-timealgorithmisonlyvalidforn=2.Intherestofthepaper,weformallydesribetheproblem(Setion2)andouralgorithmforsolvingit(Setion3),andonludewithasetofillustrativeexamples(Setion4).2ProblemStatementArepeatedbimatrixgameisplayedbytwoplayers,1and2,eahwithasetofationhoiesofsizen1andn2,respetively.Thegameisplayedinrounds,withthetwoplayerssimultaneouslymakingahoieofationateahround.IfPlayer1hoosesation1i1n1andPlayer2hooses1i2n2,theyreeivepayosofP1i1i2andP2i2i1,respetively1.Inarepeatedgame,playersselettheirations,possiblystohastially,viaastrategy|afuntionofthehistoryoftheirinterations.Theobjetiveofeahplayerinarepeatedgameistoadoptastrategythatmaximizesitsexpetedaveragepayo(limitofthemeansriterion).Apairof1Forleanlinessofnotation,wedeviatefromommonpratieandwritematriessothataplayeralwayshoosestherowofitsownpayomatrix,whiletheopponentalwayshoosestheolumn.2strategiesisaNashequilibriumifeahstrategyisoptimizedwithrespettotheother|neitherplayeranimproveitsaveragepayobyhangingstrategiesunilaterally[9℄.Asarunningexampleinthispaper,weusethewellknownIteratedPrisoner’sDilemmatoillustrateandmotivateouralgorithm.Inthisrepeatedbimatrixgame,oneahround,eahplayeraneitherooperate(Ation1)ordefet(Ation2).Thetwoplayersusethesamepayomatrix,P1=P2=2643051375.OnepairofequilibriumstrategiesinthePrisoner’sDilemmaisforbothplayerstodefetineveryround.Theaveragepayointhisaseis1forbothplayers.Thesestrategiesareinequilibriumbeauseaplayerfaingan\alwaysdefetopponentwillreeiveapayoofzeroforeveryroundinwhihitseletstheooperateation;thebestrespondto\alwaysdefetistoalwayshoosedefet.Thispaperonsidersthefollowingomputationalproblem.Givenagamespe-iedbypayomatriesP1andP2,returnapairofstrategiesthatonstitutesaNashequilibriumfortheaverage-payorepeatedbimatrixgame.Therun-ningtimeofthealgorithmshouldbeapolynomialfuntionofthesizeoftheinput.Tofullyspeifytheequilibrium-omputationproblem,wemustbeonreteabouttheinputandoutputrepresentations.Theinputrepresentationisrela-tivelystraightforward.For(p;q)2f(1;2);(2;1)g,thefuntionPpisannpnqmatrix.Toboundthesizeofthenumbersinthesematries,weassumetheyarerationalnumbers,speiedasintegernumeratorandnaturaldenominatorofnomorethankbits.So,therunningtimeofouralgorithmneedstobeapolynomialfuntionofn1,n2,andk.Notethattherepresentationsizeofanintegerisroughlyitslogarithminbasetwoandtherepresentationsizeofarationalnumberisthesumofthesizesofitsnumeratoranddenominator.Apolynomial-sizenumberisonewithrepresentationsizeboundedbyapolynomialfuntionoftheinputsize.Multi-plying,dividing,addingorsubtratingtwopolynomial-sizerationalnumbersproduesapolynomial-sizeresult,asdoessolvingapolynomial-sizesystemoflinearequationsorlinearprogram[14℄.Theoutputofanequilibriumomputationisapairofstrategies.ItiswellknownthateverybimatrixgamehasatleastonepairofstrategiesthatisaNashequilibrium.However,strategiesinrepeatedgamesanbeinnitelylargeobjetsmappingtheinterationh

A polynomial-time Nash equilibrium algorithm for r

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

建筑结构与识图第四章混凝土框架结构

药业集团组织发展战略规划]

日照市教育城域网管理规程

项目质量管理(2)

期货交易理念与策略

第七部余世维-如何成为一个成功的职业经理人

工作分析结果-

一种腹腔大、口径小铸件的熔模铸造工艺

互联网电视多媒体应用技术规范

学术会议poster模板

相关文档

相关搜索

A polynomial-time Nash equilibrium algorithm for r

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

建筑结构与识图第四章 混凝土框架结构

药业集团组织发展战略规划]

日照市教育城域网管理规程

项目质量管理(2)

期货交易理念与策略

第七部余世维-如何成为一个成功的职业经理人

工作分析结果-

一种腹腔大、口径小铸件的熔模铸造工艺

互联网电视多媒体应用技术规范

学术会议poster模板

相关文档

相关搜索

建筑结构与识图第四章混凝土框架结构