您好,欢迎访问三七文档
1OntheChoiceofRandomDirectionsforStochasticApproximationAlgorithmsJamesTheilerandJarodAlperThisworkwassupportedbytheLaboratoryDirectedResearchandDevelopment(LDRD)programatLosAlamosNationalLaboratory.J.TheilerisatechnicalstaffmemberintheSpaceandRemoteSensingSciencesGroup,MS-B244,LosAlamosNationalLaboratory,LosAlamos,NM87545.Email:jt@lanl.gov(Phone:505-665-5682;fax:505-665-4414)J.AlperrecentlygraduatedfromBrownUniversitywithadegreeinComputerScience;hewasgraduateresearchassistantatLosAlamoswhenthisworkwasdone.Email:JarodAlper@alumni.brown.edu.June27,2003DRAFT2AbstractWeinvestigatevariantsoftheKushner-ClarkRandomDirectionStochasticApproximation(RDSA)algorithmforoptimizingnoisylossfunctionsinhigh-dimensionalspaces.Thesevariantsemploydif-ferentstrategiesforchoosingrandomdirections.ThemostpopularapproachisrandomselectionfromaBernoullidistribution,whichforhistoricalreasonsgoesalsobythenameSimultaneousPerturba-tionStochasticApproximation(SPSA).Butviablealternativesincludeanaxis-aligneddistribution,anormaldistribution,andauniformdistributiononasphericalshell.AlthoughtherearespecialcaseswheretheBernoullidistributionisoptimal,thereareothercaseswhereitperformsworsethanotheralternatives.Wefindthatforgenericlossfunctionsthatarenotalignedtothecoordinateaxes,theaverageasymptoticperformanceisdependsonlyontheradialfourthmomentofthedistributionofdirections,andisidenticalforBernoulli,theaxis-aligned,andthesphericalshelldistributions.Ofthesevariants,thesphericalshellisoptimalinthesenseofminimumvarianceoverrandomorientationsofthelossfunctionwithrespecttothecoordinateaxes.Wealsoshowthatforunalignedlossfunctions,theperformanceoftheKeifer-Wolfowitz-BlumFiniteDifferenceStochasticApproximation(FDSA)isasymptoticallyequivalenttotheRDSAalgorithms,andweobservenumericallythatthepre-asymptoticperformanceofFDSAisoftensuperior.Wealsointroducea“quasirandom”selectionprocesswhichexhibitsthesameasymptoticperformance,butempiricallyisobservedtoconvergetotheasymptotemorerapidly.IndexTermsstochasticapproximation,optimization,noisylossfunction,randomdirection,finitedifference,simultaneousperturbationI.INTRODUCTIONStochasticapproximationprovidesasimpleandeffectiveapproachforfindingrootsandminimaoffunctionswhoseevaluationsarecontaminatedwithnoise.Considerasmooth1p-dimensionallossfunctionL:RRp!RR,withgradientg:RRp!RRp.AssumethatLhasaunique2local(andthereforeglobal)minimumx2RRp.Thatis,L(x)L(x)forallx2RRp,andg(x)=0iffx=x.Ifadirect(butpossiblynoisy)estimator^g(x)ofthegradientfunctionisavailable,thentheRobbins-Monro[1]algorithm(asextendedbyBlum[2]tomultidimensionalsystetms)estimates1Tosimplifyexposition,wetakeLtobeinfinitelydifferentiable,butremarkthatmanyoftheresultsonlyrequirethatLbes-timesdifferentiable,wheresdependsontheparticularresult.2Stochasticapproximationalgorithmscanstillbeusefulforlossfunctionswithmultiplelocalminima,butformalresultsaremorereadilyobtainedifthereisasinglelocalminimum.June27,2003DRAFT3arootofg(x)withthefollowingrecursion:xk+1=xk ak^g(xk);(1)whereakisasequenceofpositivenumbersthatsatisfyP1k=1ak=1andlimk!1ak=0.Inparticular,ak=ao=kwith01satisfiestheseconditions.Iftheestimatorisunbiased,thatisEf^g(x)g=g(x),thenxkwillconvergetotherootofg.Inparticular,itcanbeshownthatEf(xk x)2g=O(k )forlargek.KieferandWolfowitz[3]introducedanalgorithminwhichthegradientisestimatedbyfinitedifferences(Blum[2]alsoextendedthisresulttomultipledimensions).Thisfinitedifferencestochasticapproximation(FDSA)algorithmemploysanestimatorforthegradientwhoseithcomponentisgivenby^gi(x)=^L(x+cei) ^L(x cei)2c;(2)whereeiistheunitvectoralongtheithaxis,and^Lisanoisymeasurementofthelossfunction.Sincethisisdoneforeachcomponent,itrequires2pmeasurementsofthelossfunctionforeachiteration.Forc0,Eq.(2)isingeneralabiasedestimatorofthegradient.Convergencisachievedbyprovidingadecreasingsequenceckwithlimn!1ck=0sothatthebiasiseventuallyeliminated.However,thecostofusingasmallercisalargervariance,sotherateatwhichck!0mustbecarefullychosen.IntheFDSAalgorithm,separateestimatesarecomputedforeachcomponentofthegradient.Thismeansthatap-dimensionalproblemrequiresatleast2pevaluationsofthelossfunctionperiteration.Bycontrast,therandomdirectionstochasticapproximation(RDSA)algorithmsestimateonlyonecomponentofthegradientperiteration.Let2RRpbeadirectionvector.InKushnerandClark[4],istreatedasaunitvectorwithjj2=Pi2i=1andsinceitisarandomdirection,itsatisfiesEnTo=I=p.Chin[5]preferstheconventionthathaveradius3ppsothatjj2=Pi2i=pandEnTo=I.Regardlessofconventionfor,bothauthorswritetheRDSAformulaasxn+1=xk ak^L(xk+ckk) ^L(xk ckk)2ck#k;(3)butitbearsremarkingthattheformulasarenotequivalent.Usingthejj2=pconvention,theaboveformulacorrespondsdirectlytotheRobbins-MonroformulationinEq.(1).Withthe3Chin[5]mistakenlysaystheradiusisp.June27,2003DRAFT4jj2=1convention,however,theaboveformulacorrespondstoxn+1=xk (1=p)ak^g(xk),whichbyasimplerescalingofanisequivalent4toEq.(1).Tofacilitatecomparisonswiththemorerecentwork,wewilltaketheconventionthatjj2=p.Severalchoicesareavailableforchoosingtherandomdistribution
本文标题:On the choice of random directions for stochastic
链接地址:https://www.777doc.com/doc-3302954 .html