Logarithmic regret algorithms for online convex op

LogarithmicRegretAlgorithmsforOnlineConvexOptimizationEladHazan1?,AdamKalai2,SatyenKale1∗,andAmitAgarwal11PrincetonUniversity{ehazan,satyen,aagarwal}@princeton.edu2TTI-Chicagokalai@tti-c.orgAbstract.Inanonlineconvexoptimizationproblemadecision-makermakesasequenceofdecisions,i.e.,choosesasequenceofpointsinEuclid-eanspace,fromaﬁxedfeasibleset.Aftereachpointischosen,iten-countersasequenceof(possiblyunrelated)convexcostfunctions.Zinke-vich[Zin03]introducedthisframework,whichmodelsmanynaturalre-peateddecision-makingproblemsandgeneralizesmanyexistingprob-lemssuchasPredictionfromExpertAdviceandCover’sUniversalPort-folios.ZinkevichshowedthatasimpleonlinegradientdescentalgorithmachievesadditiveregretO(√T),foranarbitrarysequenceofTconvexcostfunctions(ofboundedgradients),withrespecttothebestsingledecisioninhindsight.Inthispaper,wegivealgorithmsthatachieveregretO(log(T))foranarbitrarysequenceofstrictlyconvexfunctions(withboundedﬁrstandsecondderivatives).ThismirrorswhathasbeendoneforthespecialcasesofpredictionfromexpertadvicebyKivinenandWarmuth[KW99],andUniversalPortfoliosbyCover[Cov91].Weproposeseveralalgorithmsachievinglogarithmicregret,whichbesidesbeingmoregeneralarealsomuchmoreeﬃcienttoimplement.ThemainnewideasgiverisetoaneﬃcientalgorithmbasedontheNew-tonmethodforoptimization,anewtoolintheﬁeld.Ouranalysisshowsasurprisingconnectiontofollow-the-leadermethod,andbuildsontherecentworkofAgarwalandHazan[AH05].Wealsoanalyzeotheralgo-rithms,whichtietogetherseveraldiﬀerentpreviousapproachesincludingfollow-the-leader,exponentialweighting,Cover’salgorithmandgradientdescent.1IntroductionIntheproblemofonlineconvexoptimization[Zin03],thereisaﬁxedconvexcompactfeasiblesetK⊂Rnandanarbitrary,unknownsequenceofconvexcostfunctionsf1,f2,...:K→R.Thedecisionmakermustmakeasequenceofdecisions,wherethetthdecisionisaselectionofapointxt∈Kandthereis?SupportedbySanjeevArora’sNSFgrantsMSPA-MCS0528414,CCF0514993,ITR0205594acostofft(xt)onperiodt.However,xtischosenwithonlytheknowledgeofthesetK,previouspointsx1,...,xt−1,andthepreviousfunctionsf1,...,ft−1.Examplesincludemanyrepeateddecision-problems:Example1:Production.Consideracompanydecidinghowmuchofndiﬀerentproductstoproduce.Inthiscase,theirproﬁtmaybeassumedtobeaconcavefunctionoftheirproduction(thegoalismaximizeproﬁtratherthanminimizecost).Thisdecisionismaderepeatedly,andthemodelallowstheproﬁtfunctionstobechangingarbitraryconcavefunctions,whichmaydependonvariousfactorssuchastheeconomy.Example2:Linearpredictionwithaconvexlossfunction.Inthisset-ting,thereisasequenceofexamples(p1,q1),...,(pT,qT)∈Rn×[0,1].Foreacht=1,2,...,T,thedecision-makermakesalinearpredictionofqt∈[0,1]whichisxtpt,forsomext∈Rn,andsuﬀerssomelossL(qt,xtpt),whereL:R×R→Rissomeﬁxed,knownconvexlossfunction,suchasquadraticL(q,q0)=(q−q0)2.Theonlineconvexoptimizationframeworkpermitsthisexample,becausethefunctionft(x)=L(qt,xpt)isaconvexfunctionofx∈Rn.Thisproblemoflin-earpredictionwithaconvexlossfunctionhasbeenwellstudied(e.g.,[CBL06]),andhenceonewouldprefertousethenear-optimalalgorithmsthathavebeendevelopedespeciallyforthatproblem.Wementionthisapplicationonlytopointoutthegeneralityoftheonlineconvexoptimizationframework.Example3:Portfoliomanagement.Inthissetting,foreacht=1,...,Tanonlineinvestorchoosesadistributionxtovernstocksinthemarket.Themarketoutcomeatiterationtiscapturedbyapricerelativesvectorct,suchthatthelosstotheinvestoris−log(xtct)(seeCover[Cov91]formotivationandmoredetailregardingthemodel).Again,theonlineconvexoptimizationframeworkpermitsthisexample,becausethefunctionft(x)=−log(xc)isaconvexfunctionofx∈Rn.Thispapershowshowthreeseeminglydiﬀerentapproachescanbeusedtoachievelogarithmicregretinthecaseofsomehigher-orderderivativeassump-tionsonthefunctions.Thealgorithmsarerelativelyeasytostate.Insomecases,theanalysisissimple,andinothersitreliesonacarefullyconstructedpoten-tialfunctionduetoAgarwalandHazan[AH05].Lastly,ourgradientdescentresultsrelatetopreviousanalysesofstochasticgradientdescent[Spa03],whichisknowntoconvergeatarateof1/TforTstepsofgradientdescentundervari-ousassumptionsonthedistributionoverfunctions.Ourresultsimplyalog(T)/Tconvergencerateforthesameproblems,thoughascommonintheonlinesetting,theassumptionsandguaranteesaresimplerandstrongerthantheirstochasticcounterparts.1.1OurresultsTheregretofthedecisionmakerattimeTisdeﬁnedtobeitstotalcostminusthecostofthebestsingledecision,wherethebestischosenwiththebeneﬁtofhindsight.regretT=regret=PTt=1ft(xt)−minx∈KPTt=1ft(x).Astandardgoalinmachinelearningandgametheoryistoachievealgorithmswithguaranteedlowregret(thisgoalisalsomotivatedbypsychology).ZinkevichshowedthatonecanguaranteeO(√T)regretforanarbitrarysequenceofdif-ferentiableconvexfunctionsofboundedgradient,whichistightuptoconstantfactors.Infact,Ω(√T)regretisunavoidableevenwhenthefunctionscomefromaﬁxeddistributionratherthanbeingchosenadversarially.3VariableMeaningK⊆RntheconvexcompactfeasiblesetD≥0thediameterofK,D=supx,y∈Kkx−ykf1,...,fTSequenceofTtwice-diﬀerentiableconvexfunctionsft:Rn→R.G≥0k∇ft(x)k≤Gforallx∈K,t≤T(inonedimension,|f0t(x)

Logarithmic regret algorithms for online convex op

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

房屋建筑学：屋顶构造

14裱糊与软包工程施工

建设工程工程量清单计价规范解读

深圳酒店式公寓调研及分析2

3免疫规划相关法律法规

税收相关法律(XXXX)

今日头条商业化推广资料（PPT31页)

产品防护控制程序书

危机管理

百仕达山水城整合推广策略ppt模板

相关文档

相关搜索