您好,欢迎访问三七文档
LogarithmicRegretAlgorithmsforOnlineConvexOptimizationEladHazan1?,AdamKalai2,SatyenKale1∗,andAmitAgarwal11PrincetonUniversity{ehazan,satyen,aagarwal}@princeton.edu2TTI-Chicagokalai@tti-c.orgAbstract.Inanonlineconvexoptimizationproblemadecision-makermakesasequenceofdecisions,i.e.,choosesasequenceofpointsinEuclid-eanspace,fromafixedfeasibleset.Aftereachpointischosen,iten-countersasequenceof(possiblyunrelated)convexcostfunctions.Zinke-vich[Zin03]introducedthisframework,whichmodelsmanynaturalre-peateddecision-makingproblemsandgeneralizesmanyexistingprob-lemssuchasPredictionfromExpertAdviceandCover’sUniversalPort-folios.ZinkevichshowedthatasimpleonlinegradientdescentalgorithmachievesadditiveregretO(√T),foranarbitrarysequenceofTconvexcostfunctions(ofboundedgradients),withrespecttothebestsingledecisioninhindsight.Inthispaper,wegivealgorithmsthatachieveregretO(log(T))foranarbitrarysequenceofstrictlyconvexfunctions(withboundedfirstandsecondderivatives).ThismirrorswhathasbeendoneforthespecialcasesofpredictionfromexpertadvicebyKivinenandWarmuth[KW99],andUniversalPortfoliosbyCover[Cov91].Weproposeseveralalgorithmsachievinglogarithmicregret,whichbesidesbeingmoregeneralarealsomuchmoreefficienttoimplement.ThemainnewideasgiverisetoanefficientalgorithmbasedontheNew-tonmethodforoptimization,anewtoolinthefield.Ouranalysisshowsasurprisingconnectiontofollow-the-leadermethod,andbuildsontherecentworkofAgarwalandHazan[AH05].Wealsoanalyzeotheralgo-rithms,whichtietogetherseveraldifferentpreviousapproachesincludingfollow-the-leader,exponentialweighting,Cover’salgorithmandgradientdescent.1IntroductionIntheproblemofonlineconvexoptimization[Zin03],thereisafixedconvexcompactfeasiblesetK⊂Rnandanarbitrary,unknownsequenceofconvexcostfunctionsf1,f2,...:K→R.Thedecisionmakermustmakeasequenceofdecisions,wherethetthdecisionisaselectionofapointxt∈Kandthereis?SupportedbySanjeevArora’sNSFgrantsMSPA-MCS0528414,CCF0514993,ITR0205594acostofft(xt)onperiodt.However,xtischosenwithonlytheknowledgeofthesetK,previouspointsx1,...,xt−1,andthepreviousfunctionsf1,...,ft−1.Examplesincludemanyrepeateddecision-problems:Example1:Production.Consideracompanydecidinghowmuchofndifferentproductstoproduce.Inthiscase,theirprofitmaybeassumedtobeaconcavefunctionoftheirproduction(thegoalismaximizeprofitratherthanminimizecost).Thisdecisionismaderepeatedly,andthemodelallowstheprofitfunctionstobechangingarbitraryconcavefunctions,whichmaydependonvariousfactorssuchastheeconomy.Example2:Linearpredictionwithaconvexlossfunction.Inthisset-ting,thereisasequenceofexamples(p1,q1),...,(pT,qT)∈Rn×[0,1].Foreacht=1,2,...,T,thedecision-makermakesalinearpredictionofqt∈[0,1]whichisxtpt,forsomext∈Rn,andsufferssomelossL(qt,xtpt),whereL:R×R→Rissomefixed,knownconvexlossfunction,suchasquadraticL(q,q0)=(q−q0)2.Theonlineconvexoptimizationframeworkpermitsthisexample,becausethefunctionft(x)=L(qt,xpt)isaconvexfunctionofx∈Rn.Thisproblemoflin-earpredictionwithaconvexlossfunctionhasbeenwellstudied(e.g.,[CBL06]),andhenceonewouldprefertousethenear-optimalalgorithmsthathavebeendevelopedespeciallyforthatproblem.Wementionthisapplicationonlytopointoutthegeneralityoftheonlineconvexoptimizationframework.Example3:Portfoliomanagement.Inthissetting,foreacht=1,...,Tanonlineinvestorchoosesadistributionxtovernstocksinthemarket.Themarketoutcomeatiterationtiscapturedbyapricerelativesvectorct,suchthatthelosstotheinvestoris−log(xtct)(seeCover[Cov91]formotivationandmoredetailregardingthemodel).Again,theonlineconvexoptimizationframeworkpermitsthisexample,becausethefunctionft(x)=−log(xc)isaconvexfunctionofx∈Rn.Thispapershowshowthreeseeminglydifferentapproachescanbeusedtoachievelogarithmicregretinthecaseofsomehigher-orderderivativeassump-tionsonthefunctions.Thealgorithmsarerelativelyeasytostate.Insomecases,theanalysisissimple,andinothersitreliesonacarefullyconstructedpoten-tialfunctionduetoAgarwalandHazan[AH05].Lastly,ourgradientdescentresultsrelatetopreviousanalysesofstochasticgradientdescent[Spa03],whichisknowntoconvergeatarateof1/TforTstepsofgradientdescentundervari-ousassumptionsonthedistributionoverfunctions.Ourresultsimplyalog(T)/Tconvergencerateforthesameproblems,thoughascommonintheonlinesetting,theassumptionsandguaranteesaresimplerandstrongerthantheirstochasticcounterparts.1.1OurresultsTheregretofthedecisionmakerattimeTisdefinedtobeitstotalcostminusthecostofthebestsingledecision,wherethebestischosenwiththebenefitofhindsight.regretT=regret=PTt=1ft(xt)−minx∈KPTt=1ft(x).Astandardgoalinmachinelearningandgametheoryistoachievealgorithmswithguaranteedlowregret(thisgoalisalsomotivatedbypsychology).ZinkevichshowedthatonecanguaranteeO(√T)regretforanarbitrarysequenceofdif-ferentiableconvexfunctionsofboundedgradient,whichistightuptoconstantfactors.Infact,Ω(√T)regretisunavoidableevenwhenthefunctionscomefromafixeddistributionratherthanbeingchosenadversarially.3VariableMeaningK⊆RntheconvexcompactfeasiblesetD≥0thediameterofK,D=supx,y∈Kkx−ykf1,...,fTSequenceofTtwice-differentiableconvexfunctionsft:Rn→R.G≥0k∇ft(x)k≤Gforallx∈K,t≤T(inonedimension,|f0t(x)
本文标题:Logarithmic regret algorithms for online convex op
链接地址:https://www.777doc.com/doc-3497063 .html