您好,欢迎访问三七文档
ComputerScienceTechnicalReportRobustReinforcementLearningControlwithStaticandDynamicStabilityaR.MatthewKretchmar,PeterM.Young,CharlesW.Anderson,DouglasC.Hittle,MichaelL.Anderson,ChristopherC.DelneroColoradoStateUniversityJuly20,2000TechnicalReportCS-00-102aFromathesissubmittedtotheAcademicFacultyofColoradoStateUniversityinpartialfulfill-mentoftherequirementsforthedegreeofDoctorofPhilosophyinComputerScience.ThisworkwaspartiallysupportedbytheNationalScienceFoundationthroughgrantsCMS-9804757and9732986.ComputerScienceDepartmentColoradoStateUniversityFortCollins,CO80523-1873Phone:(970)491-5792Fax:(970)491-2466:// R.MatthewKretchmar,PeterM.Young,CharlesW.Anderson,DouglasC.Hittle,MichaelL.Anderson,ChristopherC.DelneroColoradoStateUniversityJuly20,2000AbstractRobustcontroltheoryisusedtodesignstablecontrollersinthepresenceofuncertainties.Byreplacingnonlinearandtime-varyingaspectsofaneuralnetworkwithuncertainties,arobustreinforcementlearningprocedureresultsthatisguaranteedtoremainstableevenastheneuralnetworkisbeingtrained.Thebehaviorofthisprocedureisdemonstratedandanalyzedontwosimplecontroltasks.Foronetask,reinforcementlearningwithandwithoutrobustconstraintsresultsinthesamecontrolperformance,butatintermediatestagesthesystemwithoutrobustconstraintsgoesthroughaperiodofunstablebehaviorthatisavoidedwhentherobustconstraintsareincluded. FromathesissubmittedtotheAcademicFacultyofColoradoStateUniversityinpartialful llmentoftherequirementsforthedegreeofDoctorofPhilosophyinComputerScience.ThisworkwaspartiallysupportedbytheNationalScienceFoundationthroughgrantsCMS-9804757and9732986.11IntroductionThedesignofacontrollerisbasedonamathematicalmodelthatcapturesasmuchaspossibleallthatisknownabouttheplanttobecontrolledandthatisrepresentableinthechosenmathematicalframework.Theobjectiveisnottodesignthebestcontrollerfortheplantmodel,butfortherealplant.Robustcontroltheoryachievesthisgoalbyincludinginthemodelasetofuncertainties.WhenspecifyingthemodelinaLinear-Time-Invariant(LTI)framework,thenominalmodelofthesystemisLTIand\uncertaintiesareaddedwithgainsthatareguaranteedtoboundthetruegainsofunknown,orknownandnonlinear,partsoftheplant.Robustcontroltechniquesareappliedtotheplantmodelaugmentedwithuncertaintiesandcandidatecontrollerstoanalyzethestabilityofthetruesystem.Thisisasigni cantadvanceinpracticalcontrol,butdesigningacontrollerthatremainsstableinthepresenceofuncertaintieslimitstheaggressivenessoftheresultingcontroller,resultinginsuboptimalcontrolperformance.Inthisarticle,wedescribeanapproachforcombiningrobustcontroltechniqueswithareinforcementlearningalgorithmtoimprovetheperformanceofarobustcontrollerwhilemaintainingtheguaranteeofstability.Reinforcementlearningisaclassofalgorithmsforsolvingmulti-step,sequentialdecisionproblemsby ndingapolicyforchoosingsequencesofactionsthatoptimizethesumofsomeperformancecriterionovertime[27].Theyavoidtheunrealisticassumptionofknownstate-transitionprobabilitiesthatlimitsthepracticalityofdynamicprogrammingtechniques.Instead,reinforcementlearningalgorithmsadaptbyinteractingwiththeplantitself,takingeachstate,action,andnewstateobservationasasamplefromtheunknownstatetransitionprobabilitydistribution.Aframeworkmustbeestablishedwithenough exibilitytoallowthereinforcementlearningcontrollertoadapttoagoodcontrolstrategy.This exibilityimpliesthattherearenumerousundesirablecontrolstrategiesalsoavailabletothelearningcontroller;theengineermustbewillingtoallowthecontrollertotemporarilyassumemanyofthesepoorercontrolstrategiesasitsearchesforthebetterones.However,manyoftheundesirablestrategiesmayproduceinstabilities.Thus,ourobjectivesfortheapproachdescribedherearetwofold.Themainobjectivethatmustalwaysbesatis edisstablebehavior.Thesecondobjectiveistoaddareinforcementlearningcomponenttothecontrollertooptimizethecontrollerbehavioronthetrueplant,whileneverviolatingthemainobjective.WhilethevastmajorityofcontrollersareLTIduetothetractablemathematicsandextensivebodyofLTIresearch,anon-LTIcontrollerisoftenabletoachievegreaterperformancethananLTIcontroller,becauseitisnotsaddledwiththelimitationsofLTI.Twoclassesofnon-LTIcontrollersareparticularlyusefulforcontrol:nonlinearcontrollersandadaptivecontrollers.However,nonlinearandadaptivecontrollersaredi cult,andoftenimpossible,tostudyanalytically.Thus,theguaranteeofstablecontrolinherentinLTIdesignsissacri cedfornon-LTIcontrollers.Neuralnetworksascontrollers,orneuro-controllers,constitutemuchoftherecentnon-LTIcontrolre-search.Becauseneuralnetworksarebothnonlinearandadaptive,theycanrealizesuperiorcontrolcomparedtoLTI.However,mostneuro-controllersarestaticinthattheyrespondonlytocurrentinput,sotheymaynoto eranyimprovementoverthedynamicnatureofLTIdesigns.Littleworkhasappearedondynamicneuro-controllers.Stabilityanalysisofneuro-controllershasbeenverylimited,whichgreatlylimitstheiruseinrealapplications.Thestabilityissueforsystemswithneuro-controllersencompassestwoaspects.Staticstabilityisachievedwhenthesystem
本文标题:Robust reinforcement learning control with static
链接地址:https://www.777doc.com/doc-3161147 .html