Robust reinforcement learning control with static

ComputerScienceTechnicalReportRobustReinforcementLearningControlwithStaticandDynamicStabilityaR.MatthewKretchmar,PeterM.Young,CharlesW.Anderson,DouglasC.Hittle,MichaelL.Anderson,ChristopherC.DelneroColoradoStateUniversityJuly20,2000TechnicalReportCS-00-102aFromathesissubmittedtotheAcademicFacultyofColoradoStateUniversityinpartialfulﬁll-mentoftherequirementsforthedegreeofDoctorofPhilosophyinComputerScience.ThisworkwaspartiallysupportedbytheNationalScienceFoundationthroughgrantsCMS-9804757and9732986.ComputerScienceDepartmentColoradoStateUniversityFortCollins,CO80523-1873Phone:(970)491-5792Fax:(970)491-2466://R.MatthewKretchmar,PeterM.Young,CharlesW.Anderson,DouglasC.Hittle,MichaelL.Anderson,ChristopherC.DelneroColoradoStateUniversityJuly20,2000AbstractRobustcontroltheoryisusedtodesignstablecontrollersinthepresenceofuncertainties.Byreplacingnonlinearandtime-varyingaspectsofaneuralnetworkwithuncertainties,arobustreinforcementlearningprocedureresultsthatisguaranteedtoremainstableevenastheneuralnetworkisbeingtrained.Thebehaviorofthisprocedureisdemonstratedandanalyzedontwosimplecontroltasks.Foronetask,reinforcementlearningwithandwithoutrobustconstraintsresultsinthesamecontrolperformance,butatintermediatestagesthesystemwithoutrobustconstraintsgoesthroughaperiodofunstablebehaviorthatisavoidedwhentherobustconstraintsareincluded.FromathesissubmittedtotheAcademicFacultyofColoradoStateUniversityinpartialfulllmentoftherequirementsforthedegreeofDoctorofPhilosophyinComputerScience.ThisworkwaspartiallysupportedbytheNationalScienceFoundationthroughgrantsCMS-9804757and9732986.11IntroductionThedesignofacontrollerisbasedonamathematicalmodelthatcapturesasmuchaspossibleallthatisknownabouttheplanttobecontrolledandthatisrepresentableinthechosenmathematicalframework.Theobjectiveisnottodesignthebestcontrollerfortheplantmodel,butfortherealplant.Robustcontroltheoryachievesthisgoalbyincludinginthemodelasetofuncertainties.WhenspecifyingthemodelinaLinear-Time-Invariant(LTI)framework,thenominalmodelofthesystemisLTIand\uncertaintiesareaddedwithgainsthatareguaranteedtoboundthetruegainsofunknown,orknownandnonlinear,partsoftheplant.Robustcontroltechniquesareappliedtotheplantmodelaugmentedwithuncertaintiesandcandidatecontrollerstoanalyzethestabilityofthetruesystem.Thisisasignicantadvanceinpracticalcontrol,butdesigningacontrollerthatremainsstableinthepresenceofuncertaintieslimitstheaggressivenessoftheresultingcontroller,resultinginsuboptimalcontrolperformance.Inthisarticle,wedescribeanapproachforcombiningrobustcontroltechniqueswithareinforcementlearningalgorithmtoimprovetheperformanceofarobustcontrollerwhilemaintainingtheguaranteeofstability.Reinforcementlearningisaclassofalgorithmsforsolvingmulti-step,sequentialdecisionproblemsbyndingapolicyforchoosingsequencesofactionsthatoptimizethesumofsomeperformancecriterionovertime[27].Theyavoidtheunrealisticassumptionofknownstate-transitionprobabilitiesthatlimitsthepracticalityofdynamicprogrammingtechniques.Instead,reinforcementlearningalgorithmsadaptbyinteractingwiththeplantitself,takingeachstate,action,andnewstateobservationasasamplefromtheunknownstatetransitionprobabilitydistribution.Aframeworkmustbeestablishedwithenoughexibilitytoallowthereinforcementlearningcontrollertoadapttoagoodcontrolstrategy.Thisexibilityimpliesthattherearenumerousundesirablecontrolstrategiesalsoavailabletothelearningcontroller;theengineermustbewillingtoallowthecontrollertotemporarilyassumemanyofthesepoorercontrolstrategiesasitsearchesforthebetterones.However,manyoftheundesirablestrategiesmayproduceinstabilities.Thus,ourobjectivesfortheapproachdescribedherearetwofold.Themainobjectivethatmustalwaysbesatisedisstablebehavior.Thesecondobjectiveistoaddareinforcementlearningcomponenttothecontrollertooptimizethecontrollerbehavioronthetrueplant,whileneverviolatingthemainobjective.WhilethevastmajorityofcontrollersareLTIduetothetractablemathematicsandextensivebodyofLTIresearch,anon-LTIcontrollerisoftenabletoachievegreaterperformancethananLTIcontroller,becauseitisnotsaddledwiththelimitationsofLTI.Twoclassesofnon-LTIcontrollersareparticularlyusefulforcontrol:nonlinearcontrollersandadaptivecontrollers.However,nonlinearandadaptivecontrollersaredicult,andoftenimpossible,tostudyanalytically.Thus,theguaranteeofstablecontrolinherentinLTIdesignsissacricedfornon-LTIcontrollers.Neuralnetworksascontrollers,orneuro-controllers,constitutemuchoftherecentnon-LTIcontrolre-search.Becauseneuralnetworksarebothnonlinearandadaptive,theycanrealizesuperiorcontrolcomparedtoLTI.However,mostneuro-controllersarestaticinthattheyrespondonlytocurrentinput,sotheymaynotoeranyimprovementoverthedynamicnatureofLTIdesigns.Littleworkhasappearedondynamicneuro-controllers.Stabilityanalysisofneuro-controllershasbeenverylimited,whichgreatlylimitstheiruseinrealapplications.Thestabilityissueforsystemswithneuro-controllersencompassestwoaspects.Staticstabilityisachievedwhenthesystem

Robust reinforcement learning control with static

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

物联网44

我国施工技术现状

第九章工业、农业类信托投融资案例与评价

食品标准与法规-第一章绪论

国际化企业通用管理文案(266)产品购销合同

XXXX运营主管区域月工作总结模板

GM公司管理研究

武汉天澄环保科技公司除尘工程事业部采购员考核指标

激活疏通澄清提升（XXXX年中考研讨会讲稿）（简洁版2）pp

上市公司调研简报及年报点评

相关文档

相关搜索